recap of basic statistics

12
Basic Statistics

Upload: harun-rasheed

Post on 28-Jan-2016

5 views

Category:

Documents


0 download

DESCRIPTION

How to analyse regression result etc

TRANSCRIPT

Page 1: Recap of Basic Statistics

Basic Statistics

Page 2: Recap of Basic Statistics

Estimation

• Population - A set of entities on which some statistical inference is to be drawn . Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible.

• Parameter - Characteristics of population, generally captured by its mean, variance.

• Sample - A subset of a population which represents the population. The sample represents a subset of manageable size.

• Sampling Technique – A method of selecting a subset of individuals from within a population to estimate characteristics of the whole population. For eg; ‘Random Sampling’, ‘Systematic Sampling’, Stratified Sampling’.

Page 3: Recap of Basic Statistics

Estimation• Statistic – Characteristics of sample. Mean, Variance

comparable to population characteristics. For another sample, the values may vary.

• Estimator - Any quantity calculated from the sample data which is used to give information about an unknown quantity in the population. For example, the sample mean is an estimator of the population mean. If the value of the estimator in a particular sample is found to be 5, then 5 is the estimate of the population mean µ.

• Random/Stochastic Variable - A variable whose values vary and has chance /probability associated with it, hence follows a distribution.

• Distribution - The sampling distribution describes probabilities associated with a statistic when a random sample is drawn from a population.

Page 4: Recap of Basic Statistics

Estimation

• Degrees of Freedom (dof) – the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. If the sample size is ‘n’ and 2 parameters (say) mean & variance are estimated, then the dof of any statistic calculated from the sample will be n-2.

Page 5: Recap of Basic Statistics

Hypothesis Testing

• Null Hypothesis H0 - Formulation of things that one wants to test about the occurrence/prevalence/existence of some properties in the sample. Generally, the statement is written in a way that it assumes the non-occurrence of the event. For example, ‘two samples are not different from each other’.

• Alternative Hypothesis H1 - Hypothesis against null hypothesis. For example, in the above case it is ‘two samples are different from each other’.

• Critical Values – Used as a reference point for acceptance or rejection of null hypothesis. If the value of test statistic under the null hypothesis is less than the critical value, then the null is accepted or rejected.

• Type I error - Probability of rejecting a TRUE null.

Page 6: Recap of Basic Statistics

Hypothesis Testing• Type II error - Probability of accepting a FALSE null.

• Level of Significance - The significance level of a statistical hypothesis test is a fixed probability (0.05/0.02/0.01) of wrongly rejecting the null hypothesis H0, if it is in fact true. It is represented as α. P (Type I error) = α. 1- α is the confidence coefficients.

• p-values – Probability of wrongly rejecting a true null or acceptance probability of H0. It is equal to the significance level. It is compared with the actual significance level of the test and, if it is smaller, the result is significant. That is, if the null hypothesis were to be rejected at the 5% significance level, this would be reported as "p < 0.05". Small p-values suggest that the null is unlikely to be true. The smaller it is, the more convincing is the rejection of the null hypothesis. It indicates the strength of evidence for say, rejecting the null hypothesis H0, rather than simply concluding "Reject H0' or "Do not reject H0".

Page 7: Recap of Basic Statistics

Hypothesis Testing

• Power - It measures the test's ability to reject the null hypothesis when it is actually FALSE - that is, to make a correct decision. In other words, it is the probability of not committing a type II error. It is calculated by subtracting the probability of a type II error from 1, usually expressed as: Power = 1 - P(type II error) = 1- β. The maximum power a test can have is 1, the minimum is 0. Ideally we want a test to have high power, close to 1.

Page 8: Recap of Basic Statistics

Critical Region α = 5%

Acceptance Region

Critical RegionCritical Region

0.025% 0.025% Total = 5% Level

95%

t

f(t)

0 t0.05/2-t0.05/2

Accept null H0 : ß = 0 If t lies between - t0.05/2 to - t0.05/2 at 5% level of significance, else reject at 5%.

Page 9: Recap of Basic Statistics

Critical Region α = 5%

Acceptance Region

Critical RegionCritical Region

0.025% 0.025% Total = 5% Level

95%

t

f(t)

ß / 0 b + t0.05/2 se (b ) b - t0.05/2 se (b )

Accept null H0 : ß = 0 If ß lies between b +/- t0.05/2 SE (b ) at 5% level of significance, else reject at 5%.

Page 10: Recap of Basic Statistics

Critical Region α = 1%

Acceptance Region

Critical RegionCritical Region

0.005% 0.005%

Total = 1% Level

ß/t

f(t)/f(ß)

99% ß / 0b + t0.01/2 se (b )/t0.01/2

b - t0.01/2 se (b )/-t0.01/2

Accept null H0 : ß = 0 If ß lies between b +/- t0.01/2 SE (b ) OR, t = b/SE(b) <= |t0.01/2 | at 1% level of significance.else reject at 1%.

Page 11: Recap of Basic Statistics

Critical Region α = 10%

Acceptance Region

Critical RegionCritical Region

Total = 10% Level 0.05% 0.05%

ß/t

f(t)/f(ß)

ß / 0 b + t0.10/2 se (b )/t0.10/2 b - t0.10/2 se (b )/-t0.10/2

90%

Accept null H0 : ß = 0 If ß lies between b +/- t0.10/2 SE (b ) OR, t = b/SE(b) <= |t0.10/2 | at 10% level of significance.else reject at 10%.

Page 12: Recap of Basic Statistics

Chi-sq, t & F statistics