stat

70
Would you change the Channel? A survey by the a known organization found that 45% of the people who were offended by a television program would change the channel, while 15% would turn off their television sets. The survey further stated that the margin of error is 3% points, and 4000 adults were interviewed.

Upload: cjanero

Post on 22-Oct-2014

106 views

Category:

Documents


6 download

TRANSCRIPT

Would you change the Channel? A survey by the a known organization found that 45% of the peoplewho were offended by a television program would change the channel, while 15% would turn off their television sets. The survey further stated that the margin of error is 3% points, and 4000 adults were interviewed. Several Questions arise: 1. How do these estimates compare with the true population percentage?2. What is meant by a margin of error of 3 percentage points? 3. Is the sample of 4000 large enough to represent the population of all adults who watch television in the Philippines? STATISTICAL INFERENCE: Estimation Estimation Is a process of estimating the value of a parameter from information obtained from a sample Two Types of Estimates Point Estimates Interval Estimates Point Estimate Is a specific numerical value estimate of a parameter. Interval Estimate is an interval or range of values used to estimate the parameter. This estimate may or may not contain the value of the parameter being estimated. Three Properties of a good estimator The estimator should unbiased estimator. The estimator should be consistent. For a consistent estimator, as sample size increases the value of the estimator approaches the value of the parameter estimated. The estimator should be a relatively efficient estimator. (has smallest variance) Confidence level Is the degree of assurance that a particular statistical statement is correct, under specified conditions. Confidence Interval Is a specific interval estimate of a parameter determined by using data obtained from a sample and by using the specific confidence level of estimate Significance Level Is the degree uncertainty about the statistical statement under the same conditions used to determine the confidence level.Significance levels are symbolized by: Mathematically, Confidence level + Significance level = 1 Confidence Intervals Use to estimate range of possible values parameter, rather than a single value. When you use a confidence interval instead of a point estimator, you lose a degree of precision but you gain a large degree of confidence. In general form:

Where: lower limit = point estimator error of estimate upper limit = point estimator + error of estimate Formula for the confidence interval of the Mean for a Specific alpha - Maximum error of estimate Maximum error of estimate Is the maximum likely difference between the point estimate of a parameter and the actual value of the parameter Examples: 1. A researcher wishes to estimate the average amount of money a persons spends on lottery ticket each month. A sample of 50 people who play the lottery found the mea to be 19 dollars an the standard deviation to be 6.8. Find the best point estimate of the population mean and the 95% confidence interval of the population mean. Examples: 2. A survey of 30 adults found that the mean age of a persons primary vehicle is 5.6 years. Assuming the standard deviation of the population is 0.8 year, find the best point estimate of the population mean and the 99% confidence interval of the population mean Formula for the confidence interval of the Mean for a Specific alpha The degrees of freedom( df) are n - 1 STATISTICAL HYPOTHESIS TESTING How much better is better? Suppose a school superintendent reads an article which states that the overall meanentrance exam score is 85. furthermore, suppose that, for a sample of students, the average of the entrance exam scores in the superintendents school district is 88. Can the superintendent conclude that the students in his school district scored higher than the average? Question Arises: Is there a real difference in the means? Is the difference simply due to chance? Statistical Hypothesis is an assertion or conjecture concerning one or more populations. This conjecture may or may not be true.

Types of Hypothesis 1. Null Hypothesis -( Ho)is the hypothesis that is being tested; it represents what the experimenter doubts to be true. 2.Alternative Hypothesis ( Ha) -is the operational statement of the theory that the experimenter believes to be true and wishes to prove. It is the contradiction of the null hypothesis. It also specifies an existence of a difference or a relationship, therefore it is non- directional. Illustration of how hypotheses should be stated: Situation A: A medical researcher is interested in finding out where a new medication will have any undesirable side effects. The researcher is particularly concerned with the pulse rate of the patients who take the medication. Will the pulse rate increase, decrease, or remain unchanged after a patient takes a medication? Since the researcher knows that the mean pulse rate for the population under study is 82 beats per minute, the hypotheses for this situation are The null hypothesis specifies that the mean will remain unchanged, and the alternative states that it will be different. This test is called TWO-TAILED TEST Situation B: A chemist invents an additive to increase the life of an automobile battery. If the mean lifetime of the automobile battery without the additive is 36 months, then the hypotheses are: in this situation, the chemist is interested only in increasingthe lifetimeof the batteries, so her alternative hypothesis is that the mean is greater than 36 months. This test is called RIGHT-TAILED TEST Situation C: A contractor wishes a lower heating bills by using a special type or insulation in houses. If the average of the monthly heating bills is 500 pesos, her hypotheses about heating costs with the use of insulation are: This test is called LEFT-TAILED TEST Two-tailed test Right-tailed test Left-tailed test Summary: Exercises: State the null and alternative hypotheses for each conjecture. A.A researcher thinks if expectant mothers use vitamin pills, the birth weight of the babies will increase. The average birth weight or the population is 8.6 pounds Exercises: B.An engineer hypothesizes that the mean number ofdefects can be decreased in a manufacturing process of compact disks by using robots instead of humans for certain tasks. The mean number of defective disks per 1000 is 18. Exercises: C. A psychologist fells that playing soft music during a test will change the results of the test. The psychologist is not sure whether the grades will be higher or lower. In the past, the mean of the scores was 73. Solution: Test Statistic -is a statistics whose value is calculated from sample measurements and on which the statistical decisions will be based. Types of Error 1. Type I Error-is the error made by rejecting the null hypothesis when it is true. The probability of type I error is denoted by . 2. Type II Error - is the error made by accepting ( not rejecting ) the null hypothesis when it is false. The probability of a Type II error is denoted by . Level of Significance ( ) is the maximum probability ofcommitting Type I error the researcher is willing to commit. 3 levels: a. 0.1b. 0.05c. 0.01 Critical Value separates the critical region from the non-critical region. The symbol is C.V Critical Region or Rejection Region -is the set of values of the test statistic for which the null hypothesis will be rejected. The acceptance regionis the set of values of the test statistic for which the null hypothesis will not be rejected. The acceptance and rejection regions are separated by a critical value of the test statistic. Finding Critical values: Find the critical value(s) for each situation and draw the appropriate figure, showing the critical region. a. A left-tailed with = 0.10 b. A two-tailed test with = 0.02 c. A right-tailed with = 0.005 Factors to be consider in selectingStatistical Tests Each test is appropriate under certain conditions. When selecting a test consider four factors: structure of the null hypotheses the level of measurement allowed, or required, of the test sample size distribution of the responses (if the distribution is normal or not) Steps in Hypothesis Testing 1. Formulate the hypothesis and identify the claim. 2. Determine the critical value 3. Determine the computed value of the test statistics from the given conditions. 4. Make a decision. In making a decision we compare the computed value to the critical value. We shall have two possibilities. If the computed value is less than the critical value, we accept the null hypothesis and reject the alternative hypothesis. If the computed value is greater than the critical value, we reject the null hypothesis and accept the alternative hypothesis. 5.Summarize the results.Types of Statistical Test Z Test T- Test Chi-Square Analysis ANOVA Correlation Coefficient Z Test The simplest and most common test on thesignificanceofsampledata.The applicationofZtestrequiresnormalityof distribution.Thesamplesizeshouldbe greater than or equal to 30. This test is one oftheparametrictestssinceitutilizethe twopopulationparametersand.Ifthe population standard deviation is not known, then the sample standard deviation can be used.TheZ-testcanbeappliedintwo ways: One Sample Mean Test Formula: where : X bar sample mean hypothesized value of thepopulation mean - population standard deviation n -sample size o n XZcomputed) ( =Example: 1. A researcher reports that the average salary of assistant professors is more than 42, 000 dollars. A sample of 30 assistant professors has a mean salary of 43,260 dollars.At = 0.05, test the claim that the assistantprofessors earn more than 42,000 dollars a year. The standard deviation of the population is 5230 dollars. Solution: Step 1: Step 2: Since = 0.05 and the test is a right-tailed test, the critical value is z = + 1.65 Step 3: Step 4: Step 5: There is not enough evidence to support the claim that assistant professors earn more than 42,000 dollars a year. Example: 2.The medical rehabilitation Education Foundation reports that the average cost of rehabilitation for stroke victims is 24,672 dollars. To see if the average cost of rehabilitation is differentat a particular hospital, researcher selects a random sample of 35 stroke victims at the hospital and finds that the average cost of their rehabilitation is 25,226 dollars. The standard deviation of the population is 3251.At = 0.01, can it be concluded that the average cost of stoke rehabilitation at a particular hospital is different from 24,672 dollars? Two Sample Mean Test. Formula: where:= the variance of sample 1 = the variance of sample 2 =size of sample 1 = size of sample 2

2221212 1n nx xZcomputedo o+=21o22o1n2nCritical Values of Z at different level of Significance Test typeLevel of significance .01.025.05.10 One tailed 2.33 1.961.6451.28 Two tailed 2.575 2.331.96 1.645 Example : 1.Asuppliersellsropes.Heclaimsthattheropes have a mean strengthof 34 lbs and a variance of64lbs.Arandomsampleof32ropes selectedfromashipmentyieldsamean strengthof31lbs.Areyougoingtorejectthe claim of the supplier at .o5 level? 2.An admission test was administered to incoming freshmenintwocolleges.Twoindependent samplesof150studentseacharerandomly selectedandthemeanscoresofthegiven samplesare88and85.Assumethatthe variancesofthetestscoresare40and35 respectively.Isthedifferencebetweenthemean scores significant or can be attributed to chance? Use .01 level significance.

T- test When the sample is small n < 30 andwhen only the sample variance is known use the t- test. The use of t- test involves the use of the degree of freedom of the distribution. The degree of freedom ( df) varies accordingly to the particular type of t test to be used.Degrees of Freedom (df) Are the number of values that are free to vary after a sample statistic has been computed, and they tell the researcher which specific curve to use when a distribution consists of family of curves. OneSample mean test Formula: where:df = n 1

sn Xtcomputed) ( =Steps on Hypothesis testing State the Hypotheses and identify the claim Find the critical values Compute the test value Make the decision to reject the or nor reject the null hypothesis. Summarize the results. Examples: 1. A job placement director claims that the average starting salary for nurses is 24, 000 dollars.A sample of 10 nurses salaries has a mean of 23,450 dollars anda standard deviation of 400 dollars. Is there enough evidence to reject the directors claim at = 0.05? Solution: Step 1 Step 2: the critical values are +2.262 and -2.262for = 0.05 and d.f. = 9 Step 3: Step 4: Step 5: There is enough evidence to reject the claim that the starting salary of nurses is 24, 000 dollars. Examples: 2.An educator claims that the average salary of substitute teachers in a school district is less than 60 dollars per day. A random sample of eight school districts is selected, and the daily salaries (in dollars) are shown. Is there enough evidence to support the educators claim at = 0.10? 6056605570556055 Two Sample Mean test Formula: where :df = n1+n2 - 2

2 1 2 122 221 12 11 12) 1 ( ) 1 (n n n ns n s nx xtcomputed+ - + + =Exercises :

1.ABC company, a manufacturer of automobile tires claims that the average life of its product is 45, 600 miles. A random sample of 15 tires was chosen and resulted to a mean life of 43, 500 miles with standard deviation of 3, 000 miles. 2. It is claimed that the mean drying time of a certain brand of nail polish is less than or equal to 25 minutes. Would you agree to this claim if a random sample of 16 bottles show a mean drying time of 26 minutes with a standard deviation of 2.4 minutes, using .01 level of significance? 3. A random sample of 25 cartons of a certain brand of powdered milk showed a mean content of 237 grams with a standard deviation of 8.56 grams, while a sample of 20 cartons of another brand of powdered milk showed a mean content of 240 grams with a standard deviation of 9.75grams. Using a .05 level of significance, is there a difference in the mean content of two brands of powdered milk? CHI-SQUARE TEST The objective in Chi-square test is to compare the differences of the sample frequencies with expected frequencies. As in the case of t-test, the tabular/critical value of the chi-square statistics is dependent on two factors the level of significance and the degrees of freedom. The level of significance in this test need not be divided by two. TEST FOR INDEPENDENCE Thetestforindependenceisusedto determine whether two variables are related ornot.Sincetwovariablesareinvolved,the frequenciesareenteredinabivariatetable or contingency table. The dimension of such tableisdefinedbytheexpressionrxc where r indicates the number of rows and c indicatesthenumbersofcolumn.Ifthenull hypothesisforindependenceisrejected, thenarelationshipbetweenthetwo variables exists. Formula: Where: = observed number of cases in the ith row of the jth column

= expected number of cases under Ho Df =( r 1)(c -1) Df =( r 1)(c -1) Note: Thetestisvalidifatleast80%ofthecellshave expectedfrequenciesofatleast5nocellhasan expected frequency 1 Ifmanyexpectedfrequenciesareverysmall, researcherscommonlycombinecategoriesof variablestoobtainatablehavinglargercell frequencies.Generally,oneshouldnotpool categories unless there is a natural way to combine them. Fora2x2contingencytable,acorrectioncalled Yatescorrectionforcontinuityisapplied.The formula then becomes. Example: A survey was conducted to determine whether gender and age are related among stereo shop customers. A total of 200 respondents was taken and the results are presented below. Conducta test whether gender and age of stereo shop costumers are independent at 1% level of significance. AgeGender MaleFemaleTotal Under 306050100 30 and over801090 TOTAL14060200 Test whether a persons music preference is related to his intelligence as measured by IQ at 5% level of significance. The observed frequencies are presented below. Music Preference IQ HighMediumLowTotal Classical40261783 Pop475925131 Rock8310479266 TOTAL170189121480 Correlational Analysis You are interested in testing the null hypothesis that two variables are not correlated. Both variables are at the interval level of measurement or higher. A normal distribution of responses is not required. FORMULAS Pearson r 2222||.|

\|||.|

\|||.|

\|||.|

\|= NYNYNXNXNYNXNXYrWhere: X is the scores in a test Y is the scores in a test N is the number of examinees Interpretation of the Pearson r 0.90 to 1.00( -0.90 to -1.00)Very high positive/negative correlation 0.70 to 0.90 (-0.70 to -0.90)High positive/negative correlation 0.50 to 0.70 (-0.50 to -0.70)Moderate positive/negative correlation 0.30 to 0.50 (-0.30 to -0.50)Low positive/ negativecorrelation 0.00 to 0.30 (0.00 to -0.30)Little , if any correlation To know whether the obtained correlation coefficient is significant i.e., that a real correlation exists or that the obtained r is not merely due to a sampling variation a, t- test for testing the significance of r could be used. FORMULA: df = n-2 Where: r = the obtained Pearson r n = sample size 212rnr t=Example: A study was made to determine the relationship existing between the grade in Calculus and the grade in Fortan Computer Language. A random sample of 10 computer students in certain university were taken and the following results of the sampling. Is the relationship significant at 0.05 level? Student no. 12345678910 Calculus (x) 75838077897892869384 Fortan (y) 78877876928189899184 Analysis of Variance (ANOVA) Interested in testing a null hypothesis to find whether or not the means in more than two samples are the same. Very similar to the T-test (the T-test is in fact a variation of ANOVA). Used to compare the means of more than two groups. Can be used with small samples.