ma-250 probability and statistics nazar khan pucit lecture 5
TRANSCRIPT
![Page 1: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/1.jpg)
MA-250 Probability and Statistics
Nazar KhanPUCIT
Lecture 5
![Page 2: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/2.jpg)
Measurement Error
• In an ideal world, if the same thing is measured several times, the same result would be obtained each time.
• In reality, there are differences.– Each result is thrown off by chance error.
Individual measurement = exact value + chance error
![Page 3: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/3.jpg)
Measurement Error
• No matter how carefully it is made, a measurement could have been different than it is.
• If repeated, it will be different.• But how much different?– Simple answer:• Repeat the measurements.• Consider the SD
![Page 4: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/4.jpg)
Measurement Error
• Variability in measurements reflects the variability in the chance errors
Individual measurement = exact value + chance errorSD(Measurements) = exact value + SD(chance error)
![Page 5: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/5.jpg)
Measurement Error
• An outlier can affect the – Mean– Standard Deviation
• What if the majority data follows a normal curve?– The outliers will affect the mean and SD such that
the 68-95-99 rule might not be followed.• Solution: remove the outliers and then do
the normal approximation.
![Page 6: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/6.jpg)
Outliers
Outliers
1SD is covering ~86% of the data, so the normal approximation cannot be used.
![Page 7: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/7.jpg)
Outliers
1SD is covering ~68% of the data, so the normal approximation can be used now.
Outliers Removed
![Page 8: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/8.jpg)
Bias
• Chance error changes from measurement to measurement – sometimes positive and sometimes negative.
• Bias affects all measurements in the same way.
Individual measurement = exact value + chance error + bias
![Page 9: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/9.jpg)
below.
![Page 10: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/10.jpg)
Dealing with bi-variate data
• So far, we have dealt with uni-variate data– One variable only– Age, Height, Income, Family Size, etc.
• How can we study relationships between 2 variables?– Relationship between height of father and height
of son– Relationship between income and education
• Answer: scatter diagrams
![Page 11: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/11.jpg)
Can we summarize the scatter diagram?
![Page 12: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/12.jpg)
Summarizing a Scatter Diagram
• Mean• Horizontal SD• Vertical SD
• But these statistics do not measure the strength of the association between the 2 variables.
• How can we summarize the strength of association?
Same mean and horizontal and vertical SDs but the left figure shows more association between the 2 variables.
![Page 13: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/13.jpg)
Correlation
• Correlation measures the strength of association between 2 variables– As one increases, what happens to the other?
• Denoted by r• r=average(x in standard units* y in standard units)
Average = 0.4
![Page 14: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/14.jpg)
How does r measure association strength?
• r=average(x in standard units* y in standard units)• When both x and y are simultaneously above or below their
means, their product in standard units is +ve.• When +ve products dominate, the average of products is +ve
(i.e., correlation r is +ve).• Similarly for –ve products.
![Page 15: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/15.jpg)
Correlation
• r is always between 1 and -1.• r=0 implies no association between x and y.• |r|=1 implies strong linear association.– r=1 implies perfectly linear, positive association.– r=-1 implies perfectly linear, negative association.
![Page 16: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/16.jpg)
Very hard to predict y from x
![Page 17: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/17.jpg)
![Page 18: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/18.jpg)
Easy to predict y from x
![Page 19: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/19.jpg)
Negative association between x and y
![Page 20: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/20.jpg)
Some Properties of the Correlation Coefficient
• r has no units. (Why?)– The correlation between June temperatures for
Lahore and Karachi will be the same in Celcius and Fahrenheit.
• r(x,y)=r(y,x) (Why?)
![Page 21: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/21.jpg)
![Page 22: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/22.jpg)
Exceptions!
Strong linear association without outlier but outlier brings r down to almost 0
r measures linear association only, not all kinds of association.
![Page 23: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/23.jpg)
Association is not Causation!
• Correlation measures association but association is not causation.– In kids, shoe-size and reading skills have a strong
positive linear association. Does a larger foot improve your reading skills?
![Page 24: MA-250 Probability and Statistics Nazar Khan PUCIT Lecture 5](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e545503460f94b4b6c7/html5/thumbnails/24.jpg)
Summary
• Measurement Errors– Chance Error– Bias
• SD(chance errors) = SD(measurements)• Let’s us determine if an error is by chance or not.
• Correlation measures strength of linear association between 2 variables.– Between -1 and 1
• Not useful for summarizing scatter diagrams with – Outliers, or– Non-linear association.
• Association is not causation.