nataliemcormier.files.wordpress.com  · web view2017. 10. 15. · objective: not influenced by...

95
1 Quantitative Methods Review October 15, 2017 By Natalie Cormier Wk 1 and 2: Measurement In order to collect answers, we need to collect data as evidence If there is more than one dimension in a concept, then we must have more than one measure to capture multiple sides of said concept What are valid measures? o Face: They have a proven history o Consensual: agreement among experts in the field o Correlation: shown to relate o Predicative: if it correctly predicts a specific outcome Subjective vs objective o Subjective: relies on the judgement of measurer or respondent in a survey o Objective: not influenced by personal feelings or opinions in considering or representing the facts o Both have problems regarding asking the right questions o Larger sample means more precision o All measurements have some margin of error Things to consider while making surveys o Timing o Question order/ priming o Different participants measure things differently, mitigated by multiple questions o Who is the survey coming from? o Unobstructed measures Hawthorne effect: the act of measuring changes the results Levels of Measurement o Interval: countable What is the average? o Ordinal: relative, more or less measurement How likely are you to do something? Likely, neutral, unlikely

Upload: others

Post on 28-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

1

Quantitative Methods Review October 15, 2017By Natalie Cormier

Wk 1 and 2: Measurement In order to collect answers, we need to collect data as evidence If there is more than one dimension in a concept, then we must have more than one

measure to capture multiple sides of said concept What are valid measures?

o Face: They have a proven historyo Consensual: agreement among experts in the fieldo Correlation: shown to relateo Predicative: if it correctly predicts a specific outcome

Subjective vs objectiveo Subjective: relies on the judgement of measurer or respondent in a surveyo Objective: not influenced by personal feelings or opinions in considering or

representing the factso Both have problems regarding asking the right questionso Larger sample means more precisiono All measurements have some margin of error

Things to consider while making surveyso Timingo Question order/ primingo Different participants measure things differently, mitigated by multiple questionso Who is the survey coming from?o Unobstructed measures

Hawthorne effect: the act of measuring changes the results Levels of Measurement

o Interval: countable What is the average?

o Ordinal: relative, more or less measurement How likely are you to do something? Likely, neutral, unlikely

o Nominal: indicates same or different, usually a category Gender

Additional Resources:http://webquiz.ilrn.com/ilrn/quiz-public;jsessionid=C5972CBA32D49F03B627576932218E2D?name=stmr01q%2Fstmr01q_WS_chp03&cookieTest=1

Page 2: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

2

EX: Ordinal, Nominal, IntervalClassify each of the following as N, nominal; O, ordinal; or I/R, interval/ratio data:zip code of your local addressletter grade you will receive in this classcountry you were born inamount of money you have with youmileage (miles per gallon) your car getsbrand of chocolate you preferyear of your birth (A.D.)weight of your petdegrees earned by your teachersegg sizes (Small, Medium, Large, Extra Large, Jumbo)

zip code of your local address (N)letter grade you will receive in this class (O)country you were born in (N)amount of money you have with you (IR)mileage (miles per gallon) your car gets (IR)brand of chocolate you prefer (N)year of your birth (A.D.) (I)weight of your pet (R)degrees earned by your teachers (O)egg sizes (Small, Medium, Large, Extra Large, Jumbo) (O)

EX: Survey TechniquesSay that you want to find out if college students at Texas A&M think they are getting enough sleep, design a survey to measure this. Be sure to include some of the questions you would use.

I personally would send out a survey to students via to be administered with the end of course evaluations with several questions. Do you feel like you are well rested on an average night?How many hours of sleep do you sleep on an average night?Do you think that you need more sleep?Is there a difference between your sleep during the school year and during the semester?

Page 3: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

3

Wk 3: Research Design:

Variableso y=mX+bo Our independent (X) is influencing our dependent (Y)o m is a multiplier and the slope of the line if we graphed thiso b is a constant usually accounting for errorso Smoke = β1 (fire) + ɛ

How do we decide which is X and which is Yo Plausible hypothesis:

Theory we can testo Hypotheses must be rational

Positive correlation/slope: the increase of the independent variable means an increase in the

dependent variable The decrease of the independent variable means a decrease in the

dependent variable Negative/ inverse correlation/slope:

the increase of the independent variable means an decrease in the dependent variable

The decrease of the independent variable means a increase in the dependent variable

Page 4: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

4

Causationo Time order

Feedback effects One must come before the other, no reverse causality

o Need Correlation Co-variation They must be related

o Cannot be spurious No OVB- possible 3rd factors Spurious means that x and y look like that they are related but there is a

missing third factor (the omitted variable bias)o Theory

How and whyo EX: Attending TAMU Sporting Events

Attending TAMU sporting events = B1 (school spirit) + ɛo Correlation is not Causation

Reverse causality If y increases, then x increases so if x increases, then y increases

Omitted variable bias Needs to move in the same direction

o Statistical significance does not equal meaningful significanceo Internal Validity

We are answering the questions that we intend to answer Did the independent variable cause change in the dependent variable? Time order, covariation, nonspurious, theory

o External Validity Can the results of this study hold outside this context? Is this possible to generalize this to other organizations/places?

Page 5: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

5

DESIGNo Experimental

Internal Validity Can control and test for causality Randomly signed to experimental and control groups Measure in pre-period 'treat' experimental group Measure post-treatment Differences are observable Why might this have high internal validity?

You are only changing one variable so no OVB There is a time order so there is causation so no reverse variable

bias? What if there is a strong external shock?

We cannot control for this but it will pollute the control and the experimental group

o Quasi-Experimental Natural experiment Uses data available Ex: is students' academic achievement influenced by extra curricula

activities Why might this create more external viability?

Potentially generalize this result to other populations Types of Quasi-Experimental Design

o Cross sectionalo Case studyo Time serieso Longitudinal

Additional Resources: http://libguides.usc.edu/writingguide/researchdesigns In depth descriptions of each research design

Page 6: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

6

o Cross Sectional: Data collected on from different people at the same point in time With one observation (identity) of multiple variables We don't know the cause, just correlation

Country Time Gdp

India 1999

China 1999

Brazil 1999Definition and PurposeCross-sectional research designs have three distinctive features: no time dimension; a reliance on existing differences rather than change following intervention; and, groups are selected based on existing differences rather than random allocation. The cross-sectional design can only measure differences between or from among a variety of people, subjects, or phenomena rather than a process of change. As such, researchers using this design can only employ a relatively passive approach to making causal inferences based on findings.What do these studies tell you?

1. Cross-sectional studies provide a clear 'snapshot' of the outcome and the characteristics associated with it, at a specific point in time.

2. Unlike an experimental design, where there is an active intervention by the researcher to produce and measure change or to create differences, cross-sectional designs focus on studying and drawing inferences from existing differences between people, subjects, or phenomena.

3. Entails collecting data at and concerning one point in time. While longitudinal studies involve taking multiple measures over an extended period of time, cross-sectional research is focused on finding relationships between variables at one moment in time.

4. Groups identified for study are purposely selected based upon existing differences in the sample rather than seeking random sampling.

5. Cross-section studies are capable of using data from a large number of subjects and, unlike observational studies, is not geographically bound.

6. Can estimate prevalence of an outcome of interest because the sample is usually taken from the whole population.

7. Because cross-sectional designs generally use survey techniques to gather data, they are relatively inexpensive and take up little time to conduct.

What these studies don't tell you?1. Finding people, subjects, or phenomena to study that are very similar except in one

specific variable can be difficult.2. Results are static and time bound and, therefore, give no indication of a sequence of

events or reveal historical or temporal contexts.3. Studies cannot be utilized to establish cause and effect relationships.4. This design only provides a snapshot of analysis so there is always the possibility that a

study could have differing results if another time-frame had been chosen.5. There is no follow up to the findings.

Page 7: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

7

o Case Study: A in depth look at a particular situation and looking at the broad causes

and historical influences Looks at a singular case, cannot be applied to more than that singular

groupDefinition and PurposeA case study is an in-depth study of a particular research problem rather than a sweeping statistical survey or comprehensive comparative inquiry. It is often used to narrow down a very broad field of research into one or a few easily researchable examples. The case study research design is also useful for testing whether a specific theory and model actually applies to phenomena in the real world. It is a useful design when not much is known about an issue or phenomenon.What do these studies tell you?

1. Approach excels at bringing us to an understanding of a complex issue through detailed contextual analysis of a limited number of events or conditions and their relationships.

2. A researcher using a case study design can apply a variety of methodologies and rely on a variety of sources to investigate a research problem.

3. Design can extend experience or add strength to what is already known through previous research.

4. Social scientists, in particular, make wide use of this research design to examine contemporary real-life situations and provide the basis for the application of concepts and theories and the extension of methodologies.

5. The design can provide detailed descriptions of specific and rare cases.What these studies don't tell you?

1. A single or small number of cases offers little basis for establishing reliability or to generalize the findings to a wider population of people, places, or things.

2. Intense exposure to the study of a case may bias a researcher's interpretation of the findings.

3. Design does not facilitate assessment of cause and effect relationships.4. Vital information may be missing, making the case hard to interpret.5. The case may not be representative or typical of the larger problem being investigated.6. If the criteria for selecting a case is because it represents a very unusual or unique

phenomenon or problem for study, then your intepretation of the findings can only apply to that particular case.

Page 8: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

8

o Time Series/ Trend Studies: Data collected for the same element at different points of time Multiple times of one variable It's good in explaining causal relationships

Country

Time Gdp

India 1999

India 2000

India 2001

Definition and PurposeOften used in the medical sciences, but also found in the applied social sciences, a cohort study generally refers to a study conducted over a period of time involving members of a population which the subject or representative member comes from, and who are united by some commonality or similarity. Using a quantitative framework, a cohort study makes note of statistical occurrence within a specialized subgroup, united by same or similar characteristics that are relevant to the research problem being investigated, rather than studying statistical occurrence within the general population. Using a qualitative framework, cohort studies generally gather data using methods of observation. Cohorts can be either "open" or "closed."

Open Cohort Studies [dynamic populations, such as the population of Los Angeles] involve a population that is defined just by the state of being a part of the study in question (and being monitored for the outcome). Date of entry and exit from the study is individually defined, therefore, the size of the study population is not constant. In open cohort studies, researchers can only calculate rate based data, such as, incidence rates and variants thereof.

Closed Cohort Studies [static populations, such as patients entered into a clinical trial] involve participants who enter into the study at one defining point in time and where it is presumed that no new participants can enter the cohort. Given this, the number of study participants remains constant (or can only decrease).

What do these studies tell you?1. The use of cohorts is often mandatory because a randomized control study may be

unethical. For example, you cannot deliberately expose people to asbestos, you can only study its effects on those who have already been exposed. Research that measures risk factors often relies upon cohort designs.

2. Because cohort studies measure potential causes before the outcome has occurred, they can demonstrate that these “causes” preceded the outcome, thereby avoiding the debate as to which is the cause and which is the effect.

3. Cohort analysis is highly flexible and can provide insight into effects over time and related to a variety of different types of changes [e.g., social, cultural, political, economic, etc.].

4. Either original data or secondary data can be used in this design.What these studies don't tell you?

1. In cases where a comparative analysis of two cohorts is made [e.g., studying the effects of one group exposed to asbestos and one that has not], a researcher cannot control for all

Page 9: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

9

other factors that might differ between the two groups. These factors are known as confounding variables.

2. Cohort studies can end up taking a long time to complete if the researcher must wait for the conditions of interest to develop within the group. This also increases the chance that key variables change during the course of the study, potentially impacting the validity of the findings.

3. Due to the lack of randomization in the cohort design, its external validity is lower than that of study designs where the researcher randomly assigns participants.

Page 10: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

10

o Longitudinal: Repeated cross-sections Panel: cross sectional time-series Good for both causation and correlation

Country

Time Gdp

India 1999

India 2000

India 2001

China 1999

China 2000

China 2001

Brazil 1999

Brazil 2000

Brazil 2001In a longitudinal study subjects are followed over time with continuous or repeated monitoring of risk factors or health outcomes, or both. Such investigations vary enormously in their size and complexity. At one extreme a large population may be studied over decades. For example, the longitudinal study of the Office of Population Censuses and Surveys prospectively follows a 1% sample of the British population that was initially identified at the 1971 census. Outcomes such as mortality and incidence of cancer have been related to employment status, housing, and other variables measured at successive censuses. At the other extreme, some longitudinal studies follow up relatively small groups for a few days or weeks. Thus, firemen acutely exposed to noxious fumes might be monitored to identify any immediate effects.Most longitudinal studies examine associations between exposure to known or suspected causes of disease and subsequent morbidity or mortality. In the simplest design a sample or cohort of subjects exposed to a risk factor is identified along with a sample of unexposed controls. The two groups are then followed up prospectively, and the incidence of disease in each is measured. By comparing the incidence rates, attributable and relative risks can be estimated. Allowance can be made for suspected confounding factors either by matching the controls to the exposed subjects so that they have a similar pattern of exposure to the confounder, or by measuring exposure to the confounder in each group and adjusting for any difference in the statistical analysis.

Page 11: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

11

EX: Independent and Dependent Variables, Study DesignIdentify the IV, DV, and study design for each based on what you kow

1. A researcher hypothesizes that blondes really do have more fun. To test this hypothesis, she interviews a natural brunette who has recently become a blonde to determine if there is any change in the amount of fun she has.Independent variable: ________________________________Dependent variable: ________________________________Type of study: ________________________________

2. A developmental psychologist is testing the hypothesis that children in first grade know more words in the English language than children in Kindergarten. To test this, she sits in on two classes (one first grade, the other Kindergarten) and counts the average number of words children in each class speak. She then compares the counts.Independent variable: ________________________________Dependent variable: ________________________________Type of study: ________________________________

3. A clinical psychologist hypothesizes that people who have been diagnosed as having major depression will be more likely to also be diagnosed with an anxiety disorder than will people who have not been diagnosed with major depression. To test this, he gives a survey to 100 people being treated for depression and 100 people with no known mental disorder. The survey asks them to report whether or not they have been diagnosed as having an anxiety disorder.Independent variable: ________________________________Dependent variable: ________________________________Type of study: ________________________________

4. A pharmacologist is testing whether a new anti-anxiety medication, Moodcor, will cause people to gain weight. To test this, she gives 100 people Moodcor for one month and 100 people a placebo drug. At the end of the month, she monitors any weight gain.Independent variable: ________________________________Dependent variable: ________________________________Type of study: ________________________________

5. A developmental psychologist believes that if children successfully lie to their friends, they will be more likely to try lying to their parents. To test this hypothesis, he asks 50 children to report how many times in the last month they have lied to their friends, and whether they were successful. He then asks them how many times they have lied to their parents.Independent variable: ________________________________Dependent variable: ________________________________Type of study: ________________________________

Page 12: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

12

6. A personality psychologist believes that people who are more aggressive are more likely to purchase sports coupes than people who are less aggressive. To test this, he visits local car dealerships and asks car shoppers to complete an aggression survey. Then, he observes what types of cars they purchase (sports coupe, sedan, SUV, or pickup).Independent variable: ________________________________Dependent variable: ________________________________Type of study: ________________________________

7. A clinical psychologist hypothesizes that listening to an inspirational tape will lead one to be in a better mood. To test this, she has 50 people listen to an hour-long inspirational tape. Another 50 listen to white noise for an hour. She then has them rate their mood on a 10-point scale.Independent variable: ________________________________Dependent variable: ________________________________Type of study: ________________________________

8. A clinical psychologist is testing his theory that people who experienced a brain injury are developmentally delayed to the age at which they experienced the injury (for instance, if one has a brain injury at the age of 10, that person will always act like they are 10). To test this, he conducts developmental interviews with two people who experienced a brain injury at two different ages (one was 3 and one was 20).Independent variable: ________________________________Dependent variable: ________________________________Type of study: ________________________________

Page 13: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

13

Answers (With Explanations)1. IV: Hair colorDV: Amount of funType of study: Case study, because one person is being interviewed.

2. IV: Age (first grade or Kindergarten)DV: Number of words knownType of study: Naturalistic observation/ Quasi experimental. She is sitting in on classrooms and observing what the kids are doing without their knowledge.

3. IV: Whether one has depression or notDV: Presence of anxiety disorderType of study: Survey

4. IV: Getting the medicationDV: Weight gainType of study: Experimental. This is because the researcher is the one giving the people the medication or the placebo drug. That is, the researcher is manipulating or controlling the presence of the IV.

5. IV: Successfully lying to friendsDV: Lying to parents Type of study: This is a survey/ cross sectional. The key here is that there are a lot of people in the study (50 children). This is NOT a case study because there are too many people. Case studies only contain a handful of participants.

6. IV: AggressivenessDV: Whether a sports coupe is purchasedType of study: This is a tricky one. It is actually a combination of a survey/ cross survey AND a naturalistic observation/ quasi expermiental. This is because the aggressiveness scores are obtained with a survey. But, information on the type of car purchased is obtained by watching what the participants do. This study demonstrates that you can often use more than one method simultaneously!

7. IV: Whether one listens to the inspirational tape, or the white noiseDV: MoodType of study: Experimental. This is a lab experiment because, again, the researcher is manipulating who gets the IV. That is, she is determining whether the participants listen to the tape, or the white noise.

8. IV: Brain injuryDV: Age one is developmentally delayedType of study: Case study. Another case study…this one with two participants. If the study only contains a few people (say, about 5 or less), chances are it is a case study!

Page 14: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

14

EX: OVBWrite three examples of OVBs concerning getting good grades because of eating pizza.

You have more time to go to workshops to get free pizza and time to study.You don’t have money and so you concentrate on school and eat the pizza at school because it is free.You get good grades because you spend more time getting to know professors and you eat more pizza because you spend more time at school. (these are kind of a stretch but yeah)

EX: Program EvaluationDescription:At one large university, a group of education specialists wanted to test the effectiveness of a newly designed academic improvement course. Students seeking academic help at the university counseling center were asked to participate in this 6-week program. Only students who were judged to have deficiencies in reading comprehension and other study-related skills were chosen for the program. That is, students whose academic problems were judged to be the result of emotional difficulties of one sort or another were not enrolled in the program but were counseled in a manner more appropriate to their problem. A group of 30 students completed the program at the counseling center. A review of grades and teachers' comments revealed that a large majority of the students were doing better in school after completing the program than before. The difference between preprogram and postprogram performance measures was statistically significant.Questions:Identify three major threats to the internal validity of this study as it has been described. That is, show why there are at least three plausible hypotheses for the obtained effect other than that based on the effectiveness of the academic improvement course.One possible threat to the internal validity of this study is maturation. Because the study includes no comparison group there is no way to refute the possibility that these students might have improved as part of their own development as college students (independent of the special program). A second possible threat to internal validity in this study is regression. This threat may be particularly likely given that students were selected for their poor performance. If the instrument used to identify these students as having reading comprehension deficiencies was not perfectly reliable, then the students' apparent improvement may have reflected their regression toward the mean. Finally, history could have posed a threat to the internal validity of this study. Perhaps the professors or teaching assistants in the courses in which these students were enrolled gave them additional tutoring or instruction in study skills. This could easily have gone on while the students were enrolled in the special program. The absence of a comparison group again makes it impossible to tell whether the program or their experiences in their courses was responsible for their improvement.

Page 15: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

15

Explain how a nonequivalent control group design could be used to strengthen the internal validity of this investigation of the effect of the academic improvement program. Be sure to identify how you would obtain a control group and what measures would have to be obtained to carry out this type of design.In the previous answer reference was made to the absence of a comparison group in the present study and how this contributed to an inability to refute alternative plausible hypotheses for the effects of the special program. A nonequivalent control group design would add this comparison group and would provide both pretest and posttest measures for both the treatment and the control groups. It is unlikely that a comparable control group could be formed using a true experiment because students would have to accept being randomly assigned to the treatment and control groups. Thus, the nonequivalent control group design would be the most practical way to obtain a comparison group. The nonequivalent control group should be made up of students as similar as possible to those identified as needing the special program. For example, students who have been identified by their teachers as performing poorly in these areas might constitute a comparison group. They would differ from the treatment group because they would not receive the treatment. They would represent a nonequivalent control group because the treatment and control groups would not have been formed through random assignment.

Identify two issues of external validity that might concern an administrator of another university who is considering implementing this program at her institution.If the population of students at her school was substantially different, the administrator would be concerned with generalizing the results of the present study to those likely to be obtained at her institution. She might also be concerned about the external validity of the findings of the present study if her institution did not have the resources to implement the special program in the same way it was done in the present study.

Page 16: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

16

Description: A social scientist was hired by a local police department to evaluate the effectiveness of a program aimed at reducing juvenile delinquency in the community. When the psychologist arrived on the scene, the program had already begun. It included a three-pronged educational campaign directed at parents, community residents, and adolescents attending local schools. Public service messages appeared on television and radio; police officers and social workers made personal appearances in school classrooms; police officers went door to door in the highest crime areas to help educate residents about community resources for troubled teens. The program lasted 6 weeks.

Questions:Describe how a quasi-experimental design, specifically a simple time-series design, might be used by the scientist as part of the program evaluation. Be sure to provide details regarding relevant procedures and measures for this type of design.A time-series design would involve examining the police department records for the incidence of juvenile delinquency and any other measures that would be pertinent to the potential effectiveness of the program. The investigator then would examine records prior to the beginning of the program. The investigator would continue to monitor these measures during the program and after the program had been completed. The time-series design would demonstrate that the program had been effective if there was an abrupt change in the critical measures coincident with the implementation of the program.

Discuss the major threats to internal validity associated with the design that you have outlined.The major threats to the internal validity of a time-series design are history and instrumentation. In this situation history seems to be the more plausible threat because archival records are being used to measure the dependent variables. If the police department is sufficiently concerned about juvenile delinquency in the community to have begun such an extensive program, it is likely that others in the community are also concerned and may have introduced other programs. This general level of concern and possible concurrent programs addressing problems of juvenile delinquency would provide potential history threats to the internal validity of the police department's study. A potential instrumentation threat exists if there is a change in the ways delinquent activities are recorded during the period of observation.

The internal validity of a simple time-series design can be strengthened by including a nonequivalent control group. (a) Suggest a possible control group that the social scientist might consider in this situation. (b) Identify what measures must be obtained from the control group.A:The social scientist might consider a similar neighboring community to serve as a nonequivalent control group in this study. The sample drawn from the neighboring community should be as similar as possible to the sample undergoing the program.B:The treatment group and the nonequivalent control group should be assessed on the same dependent variables both before and after the program was implemented.

Page 17: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

17

Wk 4 Are questions subjective or objective?

o More Resources: http://www.asdatoz.com/Documents/Website-%20Objective%20vs%20subjective%20ltr.pdf

o Why graph data? Ease of interpretation Presentation

Frequency distribution Table (Tabulation in STATA)o Pairs the data values with their frequency of occurrenceso Most basic way to restructure raw datao Think birth months or you tailgated or not

Frequency polygon o Creates a shape of frequency o You can compare and see trends

more easilyo

Histogramo Emphasizes distinct groupso Categorizes data based on number

of occurrenceso More Resources:

https://www.lcps.org/cms/lib4/VA01000195/Centricity/Domain/14255/Histograms%20Multiple%20Choice%20Practice.pdf

05

1015

Freq

uenc

y

45 50 55 60 65 70Patient's age at start of exp.

Frequency

Data: Sample cancer dataset

Age Distribution of Participants in Cancer Study

Page 18: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

18

Central tendencyo A number or score or data value that represents the average of a group of data

Mean: sum of all divided by number of observations Mode:

Most common number May not be central or near the middle Can take on more than one value Much more useful for less precise levels Ex: satisfaction

Median: The middle observation in a set of numbers when the observations

are ranked in order Bimodal (take the average if there is an even amount) Not influenced by extreme values If items don't cluster near the media, the median will not be a good

measure of the group's central tendencyo Mean vs median vs modeo Skewed data or extreme values use the median

Types of data and central tendencyo Interval data (countable): median or mean or modeo Ordinal data (relative ranking in survey data): modeo Nominal data (indicates same or different): mode

Standard Deviationo Measure of dispersion

σ=√∑ (x−x )2

N

Page 19: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

19

Distribution shapeso Symmetric: evenly balanced distribution

Mean=media=mode

Page 20: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

20

o Uniform distribution Each value occurs with same frequency

o Bimodal

Page 21: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

21

o Negative and positive skewed

Page 22: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

22

Page 23: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

23

o Kurtosis The sharpness of the peak, or peakedness of the data

Page 24: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

24

EX: HistogramsExample 1:

The histogram below shows the heights (in cm) distribution of 30 people. a) How many people have heights between 159.5 and 169.5 cm? b) How many people have heights less than 159.5 cm? c) How many people have heights more than 169.5 cm? d) What percentage of people have heights between 149.5 and 179.5 cm?

histogram of heights of people

Solution to Example 1: a) 7 people b) 9 + 6 = 15 people c) 5 + 2 + 1 = 8 people d) (9 + 7 + 5)/30 = 0.7 = 70%

Example 2:

The histogram below shows the level of cholesterol (in mg per dl) of 200 people. a) How many people have a level of cholesterol between 205 and 210 mg per dl? b) How many people have a level of cholesterol less than 205 mg per dl?

c) What percentage of people have a level of cholesterol more than 215 mg per dl? d) How many people have a level of cholesterol between 205 and 220 mg per dl?

Solution to Example 2: Note that the relative frequency is shown on the vertical axis. a) 0.2*200 = 40 people b) (0.05 + 0.1)*200 = 30 people

c) (0.25 + 0.05) = 0.3 = 30% d) (0.2 + 0.35 + 0.25)*200 = 160 people

Page 25: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

25

Example 3:

The histogram below shows the efficiency level (in miles per gallons) of 110 cars. a) How many cars have have an efficiency between 15 and 20 miles per gallon? b) How many cars have have an efficiency more than 20 miles per gallon?

c) What percentage of cars have have an efficiency less than 20 miles per gallon?

histogram of heights of car efficiencySolution to Example 3:

a) 35 cars b) 25 + 15 = 40 cars c) (15 + 20 + 35) / 110 = 0.636 = 63.6%

EX: Building a Histogram1. Peter and Chris Griffin go to a hot dog eating contest. The following data shows how many hotdogs each person ate in 1 hour. Hot Dogs Eaten in One Hour 83, 76, 90, 58, 66, 44, 86, 66, 61, 59, 50, 53, 61, 64, 73 a. Construct a histogram that displays these results.

Page 26: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

26

2. Make a histogram for the data below: Number of Free Throws Frequency0-1 12-3 54-5 106-7 4

3. Use the following histograms to answer each question a) Which distribution had collected more data? Show how you know. Histogram A has more data because they have a higher total frequencyb) Which distribution has a larger range? Show or explain how you know. Histogram A has a larger range because their data ranges from 10-50 while the other only goes from 0-40c) Which distribution is more likely to have a shape described as “skewed right?” Histogram B where the mean is greater than the mediand) Which distribution is more likely to have a higher median than mean? Explain why this would happen. Histogram A where it is left skewed.

Page 27: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

27

4. The numbers below indicate the number of Summer Olympic medals awarded to athletes from the United States during 18 Summer Olympic Games. U.S. Summer Olympic Medals 55, 56, 62, 71, 74, 76, 84, 90, 94, 94, 96, 101, 103, 103, 103, 107, 108, 174a. Construct a histogram of these data.

EX: Central Tendency

A high school teacher at a small private school assigns trigonometry practice problems to be worked via the net. Students must use a password to access the problems and the time of log-in and log-off are automatically recorded for the teacher. At the end of the week, the teacher examines the amount of time each student spent working the assigned problems. The data is provided below in minutes.

15, 28, 25, 48, 22, 43, 49, 34, 22, 33, 27, 25, 22, 20, 39 Find the Mean, Median, and Mode for the above data.

Mean:30.13, Median: 27, Mode: 22 What does this information tell you about students' length of time on the computer

solving trigonometry problems? Mean spent 30 min, half of students spent more than 27 min and half less. More student spent time 22 min solving than any other amount.

Is this data skewed?The data indicate a slight positive skew. This is most likely due to the students who spent over 40 minutes working on the trigonometry problems. Note, that it is a very slight skew - only approximately a 3 minute difference between the mean and median.

A group committed to quality television has been concerned about a new talk show. For two weeks, they decide to count the number of words that must be "bleeped" as too obscene for television and the number of physical altercations. They hope that after recording this data that

Page 28: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

28

they will be able to argue that the show is inappropriate for television particularly during the day. The data for number of words censored is provided below.

342, 267, 321, 157, 33, 349, 254, 166, 132, 289 Find the Mean, Median, and Mode for the above data. What does this information tell you about the talk show? Is this data skewed?

Mean:231.0Median: The median is 260.5. The two middle scores are 254 and 267. By adding these two numbers together and dividing by 2, I find the median = 260.5. Half of the scores fall above this number and half fall below.Mode: This data set has no mode; no number occurs more than once.What does this information tell you about the talk show?All things considered, probably not the best show for your kids to watch (particularly if they can lip read). The mean number of "bleeped" words per show is 231 words. Half of the shows have to censor over approximately 260 words and half censor less. There is no mode - each show appears to be unique as to the number of words "bleeped".Is this data skewed? The show with only 33 words censored has caused a negative skew and distorted the mean downwards a bit.

More Resourceshttp://faculty.webster.edu/woolflm/3danswer.html

Page 29: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

29

EX: Mean and Standard Deviation1. Consider the following three data sets A, B and C. 

A = {9,10,11,7,13} 

B = {10,10,10,10,10} Find 

C = {1,1,10,19,19} 

a) Calculate the mean of each data set. 

b) Calculate the standard deviation of each data set. 

c) Which set has the largest standard deviation? 

d) Is it possible to answer question c) without calculations of the standard deviation? 

2. A given data set has a mean μ and a standard deviation σ. 

a) What are the new values of the mean and the standard deviation if the same constant k is added to each data value in the given set?Explain.

b) What are the new values of the mean and the standard deviation if each data value of the set is multiplied by the same constant k?Explain. 

3. If the standard deviation of a given data set is equal to zero, what can we say about the data values included in the given data set? 

Page 30: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

30

4. The frequency table of the monthly salaries of 20 people is shown below. salary(in $) frequency

3500 54000 84200 54300 2

a) Calculate the mean of the salaries of the 20 people. 

b) Calculate the standard deviation of the salaries of the 20 people. 

5. The following table shows the grouped data, in classes, for the heights of 50 people. height (in

cm) - classes frequency

120 <- 130 2130 <- 140 5140 <- 150 25150 <- 160 10160 <- 170 8

a) Calculate the mean of the height of the 20 people. 

b) Calculate the standard deviation of the height of the 20 people.

More Resources:http://www.analyzemath.com/statistics.html

Page 31: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

31

1. mean of Data set A = (9+10+11+7+13)/5 = 10 mean of Data set B = (10+10+10+10+10)/5 = 10 mean of Data set C = (1+1+10+19+19)/5 = 10a. Standard Deviation Data set A 

= √[ ( (9-10)2+(10-10)2+(11-10)2+(7-10)2+(13-10)2 )/5 ] = 2 Standard Deviation Data set B = √[ ( (10-10)2+(10-10)2+(10-10)2+(10-10)2+(10-10)2 )/5 ] = 0 Standard Deviation Data set C = √[ ( (1-10)2+(1-10)2+(10-10)2+(19-10)2+(19-10)2 )/5 ] = 8.05 

b. Data set C has the largest standard deviation. c. Yes, since data Set C has data values that are further away from the mean

compared to sets A and B.2. We limit the discusion to a data set with 3 values for simplicity, but the

conclusions are true for any data set with quantitative data. Let x, y and z be the data values making a data set. The mean μ = (x + y + z) / 3 The standard deviation σ = √[ ((x - μ)2 + (y - μ)2 + (z - μ)2)/3 ] We now add a constant k to each data value and calculate the new mean μ'. μ' = ((x + k) + (y + k) + (z + k)) / 3 = (x + y + z) / 3 + 3k/3 = μ + k We now calculate the new mean standard deviation σ'. σ' = √[ ((x + k - μ')2 +(y + k - μ')2+(z + k - μ')2)/3 ] Note that x + k - μ' = x + k - μ - k = x - μ also y + k - μ' = y + k - μ - k = y - μ and z + k - μ' = z + k - μ - k = z - μ Therefore σ' = √[ ((x - μ)2 +(y - μ)2+(z - μ)2)/3 ] = σ 

If we add the same constant k to all data values included in a data set, we obtain a new data set whose mean is the mean of the original data set PLUS k. The standard deviation does not change. a. We now multiply all data values by a constant k and calculate the new mean

μ' and the new standard deviation σ'. μ' = (kx + ky + kz) / 3 = kμ σ' = √[ ((kx - kμ)2 +(ky - kμ)2+(kz - kμ)2)/3 ] = |k| σ If we multiply all data values included in a data set by a constant k, we obtain a new data set whose mean is the mean of the original data set TIMES k and standard deviation is the standard deviation of the original data set TIMES the absolute value of k.

3. Again, we limit the discussion to a data set with 4 values for simplicity, but the conclusions are true for any data set with quantitative data. Let x, y, z and w be the data values making a data set with mean μ. The standard deviation σ = √[ ((x - μ)2 + (y - μ)2 + (z - μ)2 + (w - μ)2)/3 ] Let σ = 0, hence √[ ((x - μ)2 + (y - μ)2 + (z - μ)2 + (w - μ)2)/3 ] = 0 Which gives (x - μ)2 + (y - μ)2 + (z - μ)2 + (w - μ)2 = 0 All terms in the equation are positive and therefore, the above equation is equivalent to 

Page 32: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

32

(x - μ)2 = 0, (y - μ)2 = 0, (z - μ)2 = 0 and (w - μ)2 = 0. Which gives x = y = z = w = μ : all data values in the set with σ = 0 are equal.

4. Let xi be the i th salary and fi be the corresponding frequency. mean of grouped data = μ = (Σxi*fi) / Σfi

= (3500*5 + 4000*8 + 4200*5 + 4300*2) /(5 + 8 + 5 + 2) = $3955 b) standard deviation of grouped data = √[ (Σ(xi-μ)2*fi) / Σfi ] = √[ (5*(3500-3955)2+8*(4000-3955)2+5*(4200-3955)2+2*(4300-3955)2) /(20) ] = 282 (rounded to the nearest unit)

5. We first find the midpoints of the given classes. height (in

cm) - classes midpoint frequency

120 <- 130 125 2130 <- 140 135 5140 <- 150 145 25150 <- 160 155 10160 <- 170 165 8

a. Let mi be the midpoint of the i th clss and fi be the corresponding frequency. 

mean of grouped data = μ = (Σmi*fi) / Σfi

= (125*2 + 135*5 + 145*25 + 155*10 + 165*8) /(2+5+25+10+8) = 148.4 

b) standard deviation of grouped data = √[ (Σ(mi-μ)2*fi) / Σfi ] = √[ (2*(125-148.4)2+5*(135-148.4)2+25*(145-148.4)2+10*(155-148.4)2+8*(165-148.4)2) /(50) ] = 9.9

Page 33: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

33

EX: Skewedness

(a) Is the histogram symmetric, left skewed or right skewed?Right skewed(b) Is the histogram unimodal, bimodal, or multimodal?Unimodal(c) Approximately how many households made more than $50,000?

What does this tell you about the median income?Approximately 60 million households made more than $50000, median is about $50000.(d) Is the mean income for 2011 higher or lower than the median income?Mean is higher than median

Page 34: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

34

EX: Kurtosis

Which line has the smallest kurtosis value? Largest? How do you know?

Smallest: The flat oneLargest: The tall one

Kurtosis defines the breadth and peakedness of a line.

Page 35: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

35

Wk 5: Probability and z-scoresConditional ProbabilityConditional probability deals with further defining dependence of events by looking at probability of an event given that some other event first occurs.Conditional probability is denoted by the following:

The above is read as the probability that B occurs given that A has already occurred.The above is mathematically defined as:

Set Theory in ProbabilityA sample space is defined as a universal set of all possible outcomes from a given experiment.Given two events A and B and given that these events are part of a sample space S. This sample space is represented as a set as in the diagram below.

The different regions of the set S can be explained as using the rules of probability.

Multiplication Rule (A∩B)This region is referred to as 'A intersection B' and in probability; this region refers to the event that both A and B happen. When we use the word and we are referring to multiplication, thus A and B can be thought of as AxB or (using dot

notation which is more popular in probability) A•BIf A and B are dependent events, the probability of this event happening can be calculated as shown below:

If A and B are independent events, the probability of this event happening can be calculated as shown below:

Conditional probability for two independent events can be redefined using the relationship above to become:

The above is consistent with the definition of independent events, the occurrence of event A in no way influences the occurrence of event B, and so the probability that event B occurs given that event A has occurred is the same as the probability of event B.

Page 36: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

36

Additive Rule (A∪B)In probability we refer to the addition operator (+) as or. Thus when we want to we want to define some event such that the event can be A or B, to find the probability of that event:

Thus it follows that:

Mutual ExclusivityCertain special pairs of events have a unique relationship referred to as mutual exclusivity.Two events are said to be mutually exclusive if they can't occur at the same time. For a given sample space, its either one or the other but not both. As a consequence, mutually exclusive events have their probability defined as follows:

An example of mutually exclusive events are the outcomes of a fair coin flip. When you flip a fair coin, you either get a head or a tail but not both, we can prove that these events are mutually exclusive by adding their probabilities:

For any given pair of events, if the sum of their probabilities is equal to one, then those two events are mutually exclusive.Rules of Probability for Mutually Exclusive Events

Multiplication RuleFrom the definition of mutually exclusive events, we should quickly conclude the following:

Addition RuleAs we defined above, the addition rule applies to mutually exclusive events as follows:

Conditional Probability for Mutually Exclusive EventsWe have defined conditional probability with the following equation:

We can redefine the above using the multiplication rule

hence

Below is a Venn diagram of a set containing two mutually exclusive events A and B.

Page 37: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

37

EX:Volunteers for a disaster relief effort were classified according to both specialty (C: construction, E: education, M: medicine) and language ability (S: speaks a single language fluently, T: speaks two or more languages fluently). The results are shown in the following two-way classificationtable:

Specialty

Language Ability

S T

C 12 1

E 4 3

M 6 2The first row of numbers means that 12 volunteers whose specialty is construction speak a single language fluently, and 1 volunteer whose specialty is construction speaks at least two languages fluently. Similarly for the other two rows.A volunteer is selected at random, meaning that each one has an equal chance of being chosen. Find the probability that:a. his specialty is medicine and he speaks two or more languages;

b. either his specialty is medicine or he speaks two or more languages;c. his specialty is something other than medicine.

Page 38: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

38

Solution:When information is presented in a two-way classification table it is typically convenient to adjoin to the table the row and column totals, to produce a new table like this:

Specialty

Language Ability

TotalS T

C 12 1 13

E 4 3 7

M 6 2 8

Total 22 6 28a. The probability sought is P(M∩T).P(M∩T). The table shows that there are 2 such

people, out of 28 in all, hence P(M∩T)=2∕28≈0.07P(M∩T)=2∕28≈0.07 or about a 7% chance.

b.c. The probability sought is P(M∪T).P(M∪T). The third row total and the grand

total in the sample give P(M)=8∕28.P(M)=8∕28. The second column total and the grand total give P(T)=6∕28.P(T)=6∕28. Thus using the result from part (a),P(M∪T)=P(M)+P(T)−P(M∩T)=828+628−228=1228≈0.43P(M∪T)=P(M)+P(T)

−P(M∩T)=828+628−228=1228≈0.43 or about a 43% chance.d. This probability can be computed in two ways. Since the event of interest can be

viewed as the event C ∪ E and the events C and E are mutually exclusive, the answer is, using the first two row totals,

P(C∪E)=P(C)+P(E)−P(C∩E)=1328+728−028=2028≈0.71P(C∪E)=P(C)+P(E)−P(C∩E)=1328+728−028=2028≈0.71

On the other hand, the event of interest can be thought of as the complement Mc of M, hence using the value of P(M)P(M) computed in part (b),

P(Mc)=1−P(M)=1−828=2028≈0.71P(Mc)=1−P(M)=1−828=2028≈0.71as before.

Page 39: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

39

Z-Score

z-score is the number of standard deviations from the mean a data point iso measure of how many standard deviations below or above the population mean a

raw score iso placed on a normal distribution curve o The basic z score formula for a sample is:

z = (x – μ) / σo EX: let’s say you have a test score of 190. The test has a mean (μ) of 150 and

a standard deviation (σ) of 25. Assuming a normal distribution, your z score would be:z = (x – μ) / σ= 190 – 150 / 25 = 1.6.The z score tells you how many standard deviations from the mean your score is. In this example, your score is 1.6 standard deviations above the mean.

When you have multiple samples and want to describe the standard deviation of those sample means (the standard error), you would use this z score formula:z = (x – μ) / (σ / √n)

o This z-score will tell you how many standard errors there are between the sample mean and the population mean.

o Sample problem: In general, the mean height of women is 65″ with a standard deviation of 3.5″. What is the probability of finding a random sample of 50 women with a mean height of 70″, assuming the heights are normally distributed?

z = (x – μ) / (σ / √n)= (70 – 65) / (3.5/√50) = 5 / 0.495 = 10.1

a z-score is the number of standard deviations from the mean value of the reference population For example:

o A z-score of 1 is 1 standard deviation above the mean.o A score of 2 is 2 standard deviations above the mean.o A score of -1.8 is -1.8 standard deviations below the mean.o A z-score tells you where the score lies on a normal distribution curve. A z-score

of zero tells you the values is exactly average while a score of +3 tells you that the value is much higher than average.

Page 40: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

40

Page 41: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

41

EX: Probability

P(A) = 0.6P(B) = 0.2P(A or B) = ??????P(A and B) = 0.1

P(A or B)= .7Are A and B mutually exclusive? NO

P(A) = 30%P(B) = ??????P(A or B) = 10%P(A and B) = 50%

P(A or B)= 30%Are A and B mutually exclusive? NO

Page 42: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

42

EX: Probability1. The sample space for three tosses of a coin is

S={hhh,hht,hth,htt,thh,tht,tth,ttt}S={hhh,hht,hth,htt,thh,tht,tth,ttt}Define events

H:at least one head is observedM:more heads than  tails are observed

a. List the outcomes that comprise H and M.b. List the outcomes that comprise H ∩ M, H ∪ M, and Hc.c. Assuming all outcomes are equally likely, find P(H∩M)P(H∩M), P(H∪M)P(H∪M),

and P(Hc).P(Hc).d. Determine whether or not Hc and M are mutually exclusive. Explain why or why not.

a. H={hhh,hht,hth,htt,thh,tht,tth} }, M={hhh,hht,hth,thh}b. H∩M={hhh,hht,hth,thh}, H∪M=H, Hc={ttt}c. P(H∩M)=4∕8, P(H∪M)=7∕8, P(Hc)=1∕8Pd. Mutually exclusive because they have no elements in common.

2. The Venn diagram provided shows a sample space and two events A and B. Suppose P(a)=0.13P(a)=0.13, P(b)=0.09P(b)=0.09, P(c)=0.27P(c)=0.27, P(d)=0.20P(d)=0.20, and P(e)=0.31.P(e)=0.31. Confirm that the probabilities of the outcomes add up to 1, then compute the following probabilities.

 a. P(A).P(A).b. P(B).P(B).c. P(Ac)P(Ac) two ways: (i) by finding the outcomes in Ac and adding their probabilities, and

(ii) using the Probability Rule for Complements.d. P(A∩B).P(A∩B).e. P(A∪B)P(A∪B) two ways: (i) by finding the outcomes in A ∪ B and adding their

probabilities, and (ii) using the Additive Rule of Probability.A: 0.36B: 0.78C: 0.64D: 0.27E: 0.87

Page 43: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

43

3. Confirm that the probabilities in the two-way contingency table add up to 1, then use it to find the probabilities of the events indicated.

U V W

A 0.15 0.00 0.23

B 0.22 0.30 0.10a. P(A)P(A), P(B)P(B), P(A∩B).P(A∩B).b. P(U)P(U), P(W)P(W), P(U∩W).P(U∩W).c. P(U∪W).P(U∪W).d. P(Vc).P(Vc).e. Determine whether or not the events A and U are mutually exclusive; the events A and V.

4. P(A)=0.38P(A)=0.38, P(B)=0.62P(B)=0.62, P(A∩B)=0P(A∩B)=05. P(U)=0.37P(U)=0.37, P(W)=0.33P(W)=0.33, P(U∩W)=0P(U∩W)=06. 0.77. 0.78. A and U are not mutually exclusive because P(A∩U)P(A∩U) is the nonzero number

0.15. A and V are mutually exclusive because P(A∩V)=0.

13. The sample space that describes all three-child families according to the genders of the children with respect to birth order isS={bbb,bbg,bgb,bgg,gbb,gbg,ggb,ggg}.S={bbb,bbg,bgb,bgg,gbb,gbg,ggb,ggg}.For each of the following events in the experiment of selecting a three-child family at random, state the complement of the event in the simplest possible terms, then find the outcomes that comprise the event.A:At least one child is a girl.B: At most one child is a girl.C: All of the children are girls.D: Exactly two of the children are girls.E: The first born is a girl.“All the children are boys.”Event: {bbg,bgb,bgg,gbb,gbg,ggb,ggg}{bbg,bgb,bgg,gbb,gbg,ggb,ggg},“At least two of the children are girls” or “There are two or three girls.”Event: {bbb,bbg,bgb,gbb}{bbb,bbg,bgb,gbb},“At least one child is a boy.”Event: {ggg}{ggg}, “There are either no girls, exactly one girl, or three girls.”Event: {bgg,gbg,ggb}{bgg,gbg,ggb}, “The first born is a boy.”Event: {gbb,gbg,ggb,ggg}{gbb,gbg,ggb,ggg},

Page 44: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

44

11. The breakdown of the students enrolled in a university course by class (F: freshman, SoSo: sophomore, J: junior, SeSe: senior) and academic major (S: science, mathematics, or engineering, L: liberal arts, O: other) is shown in the two-way classification table.

Major

Class

F So JSe

S 92 42 20 13

L 368 167 80 53

O 460 209 100 67A student enrolled in the course is selected at random. Adjoin the row and column totals to the table and use the expanded table to find the probability of each of the following events.

a. The student is a freshman.b. The student is a liberal arts major.c. The student is a freshman liberal arts major.d. The student is either a freshman or a liberal arts major.e. The student is not a liberal arts major.

A:920/1671B: 668/1671C: 368/1671D: 1220/1671E: 1003/1671

More resources: https://catalog.flatworldknowledge.com/bookhub/reader/3318?e=fwk-shafer-ch03_s02

Page 45: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

45

EX: Z scores

1. A normal distribution of scores has a standard deviation of 10. Find the z-scores corresponding to each of the following values:

a) A score that is 20 points above the mean. z=2b) A score that is 10 points below the mean. z=-1c) A score that is 15 points above the mean z=1.5d) A score that is 30 points below the mean. z=-3

2. The Welcher Adult Intelligence Test Scale is composed of a number of subtests. On one subtest, the raw scores have a mean of 35 and a standard deviation of 6. Assuming these raw scores form a normal distribution:

a) What number represents the 65th percentile (what number separates the lower 65% of the distribution)? 37.31

b) What number represents the 90th percentile? 42.71

c) What is the probability of getting a raw score between 28 and 38? 57%

d) What is the probability of getting a raw score between 41 and 44? 9%

3. Scores on the SAT form a normal distribution with and .a) What is the minimum score necessary to be in the top 15% of the SAT

distribution? 604

b) Find the range of values that defines the middle 80% of the distribution of SAT scores (372 and 628). Find the z-scores - -1.28, 1.28

4. For a normal distribution, find the z-score that separates the distribution as follows:a) Separate the highest 30% from the rest of the distribution. .52

b) Separate the lowest 40% from the rest of the distribution. .25

c) Separate the highest 75% from the rest of the distribution. -.67

5. For the numbers below, find the area between the mean and the z-score:a) z = 1.17 .38b) z = -1.37 .41

Page 46: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

46

6. For the z-scores below, find the percentile rank (percent of individuals scoring below):a) -0.47 31.9 Percentile

b) 2.24 98.8 Percentile

7. For the numbers below, find the percent of cases falling above the z-score:a) 0.24 41%

b) -2.07 98%

8. A patient recently diagnosed with Alzheimer’s disease takes a cognitive abilities test and scores a 45. The mean on this test is 52 and the standard deviation is 5. What is the patient’s percentile rank? 8.1%

9. A fifth grader takes a standardized achievement test (mean = 125, standard deviation = 15) and scores a 148. What is the child’s percentile rank? 94%

10. Pat and Chris both took a spatial abilities test (mean = 80, std. dev. = 8). Pat scores a 76 and Chris scored a 94. What percent of individuals would score between Pat and Chris? 65%

11. A normal distribution of scores has a standard deviation of 10. Find the z-scores corresponding to each of the following values:

a) A score of 60, where the mean score of the sample data values is 40. Z=2b) A score that is 30 points below the mean. z=-3c) A score of 80, where the mean score of the sample data values is 30. Z=5

d) A score of 20, where the mean score of the sample data values is 50. Z=-3

12. IQ scores have a mean of 100 and a standard deviation of 16. Albert Einstein reportedly had an IQ of 160.

a. What is the difference between Einsteins IQ and the mean? 60

b. How many standard deviations is that? 3.75

c. Convert Einstein’s IQ score to a z score. (160 – 100)/16 = 3.75

d. If we consider “usual IQ scores to be those that convert z scores between -2 and 2, is Einstein’s IQ usual or unusual? Unusual

Page 47: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

47

13. Women’s heights have a mean of 63.6 in. and a standard deviation of 2.5 inches. Find the z score corresponding to a woman with a height of 70 inches and determine whether the height is unusual. Z = (70 – 63.6)/2.5 = 2.56

14. Three students take equivalent stress tests. Which is the highest relative score (meaning which has the largest z score value)? C has the highest z - score

a. A score of 144 on a test with a mean of 128 and a standard deviation of 34..47

b. A score of 90 on a test with a mean of 86 and a standard deviation of 18. .22

c. A score of 18 on a test with a mean of 15 and a standard deviation of 5. .6

Page 48: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

48

Wk 6: T-Score

Population SampleDescriptive States Inferential

z-score t-scorez = (x – μ) / σ

Standard error:o The standard deviation of sample means, so it measures the distance away a

sample is from the average standard deviation

How to decrease erroro Larger samples will decrease erroro Larger sampels are more reflective of population characteristics

T-distribution: it is a sampling distribution, like z score but o Assumed mean of population lies at centero Resembles a normal distributiono All sample means fall under this distributiono When N is less than 30, t is differento Whne N is greater than 30, then it looks like a z score

The t score formula enables you to take an individual score and transform it into a standardized form — one which helps you to compare scores.

You’ll want to use the t score formula when you don’t know the population standard deviation and you have a small sample (under 30).

The t score formula is:

Where

Page 49: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

49

x̄ = sample meanμ0 = population means = sample standard deviationn = sample size

If you have only one item in your sample, the square root in the denominator becomes √1. This means the formula becomes:

In simple terms, the larger the t score, the larger the difference is between the groups you are testing. It’s influenced by many factors including:

o How many items are in your sample.o The means of your sample.o The mean of the population from which your sample is drawn.o The standard deviation of your sample.

You traditionally look up a t score in a t-table. degrees of freedom: The number of items in your sample, minus one For example, if you have 20 items in your sample, then df = 19. You use the degrees of freedom along with the confidence level you are willing to accept,

to decide whether to support or reject the null hypothesis. The t score formula can also be used to solve probability questions. You won’t have

an alpha level, but you can use the result from the formula

EX: T-score (11.4)Pop: 2000Sample pop: 10Mean of sample: 2,200 gal/minSd sample: 500

t= X−μs

√N

t=2200−2000500√10

=1.26

We cannot say that we are 90% confident that the sample mean of the new pumps are higher than the old pumps*Df= n-1 if n is less than 30

Page 50: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

50

Confidence IntervalCI=x± s . e . (t x % )

CI=2200 ± 500√10

(1.833) for 90% confidence and 9 degrees of freedom

1910.177-2489.82We are 90% confident that the new pump average will fall between these ranges Remember that two tail tests halve the confidence level when using the t table

CI=2200 ± 500√10

(2.27) for 95% confidence and 9 degrees of freedom

1841.081-2558.91We are 95% confident that the new pump average will fall in this range.

95% is larger because we are more sure that the new average will fall in this category

Hypothesis Testingo Hypothesis: something is differento Null hypothesis: nothing is differento If the probabiltiy of our null hypothesis is low, then we can reject the null

Ex T Score (11.6)We know that the average weight gain for cattle last year was 12.7A sample of cattle (30) are fed cement feed and their aveerage gain was 14.1 lbs with a s of 5 lbsH: cement will increase weight gain to more than 12.7 lbsH0: cement will not increase weight more than 12.7 lbs

Sample mean: 14.1Sd: 5Se: 5/sqrt(30)=.91287T score at 95%: 1.699

t=14.1−12.75

√30

=1.533

We must accept the null hypothesis and reject the research hypothesis

Page 51: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

51

Alpha score/valueso α values help us establish confidence. o α =.01 means we are 99% confidento α=.1 means that we are 90% confident

Errorso Type 1 error: false positive: reject null when null is true

Sending an innocent man to jailo Type 2 error: false negative: fail to reject null when null is false

Ruling a guilty man innocent Telling you something is negative when something is true

How to Correct Errorso Reduce type 1 error by making alpha value smallo Increases type 2 erroro Reduce type 2 error by increasing sample size

EX 11.105% or less fraudulent receiptsSample of 10Mean of 4.7Sd 1.2H1: The average district office has 5% or fewer fraudulent receiptsH0: The average district office does not have 5% or fewer fraudulent receipts

t= 5−4.71.2√10

=.79

Tscore is 1.833 for 95% confidentWe reject the research hypothesis and accept the null.

Sample size:

EX: Sample SizeIf you want to be 95% certain you are within $50Sd =250

(1.96(250)/(50))^2= 96.04 is the sample size I need

Page 52: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

52

EX: T-ScoresA research study was conducted to examine the differences between older and younger adults on perceived life satisfaction. A pilot study was conducted to examine this hypothesis. Ten older adults (over the age of 70) and ten younger adults (between 20 and 30) were give a life satisfaction test (known to have high reliability and validity). Scores on the measure range from 0 to 60 with high scores indicative of high life satisfaction; low scores indicative of low life satisfaction. The data are presented below. Compute the appropriate t-test.

Older Adults Younger Adults45 3438 2252 1548 2725 3739 4151 2446 1955 2646 36

Mean = Mean =S = S =S2 = S2 =

1. What is your computed answer?2. What would be the null hypothesis in this study?3. What would be the alternate hypothesis?4. What probability level did you choose and why?5. What is your tcrit?6. Is there a significant difference between the two groups?7. Interpret your answer.8. If you have made an error, would it be a Type I or a Type II error? Explain your answer.

Mean = 44.5 Mean = 28.1S = 8.682677518 S = 8.543353492S2 = 75.388888888 S2 = 72.988888888

Independent t-test1. What is your computed answer? tobs = 4.2572. What would be the null hypothesis in this study? The null hypothesis would be that

there are no significant differences between younger and older adults on life satisfaction.

3. What would be the alternate hypothesis? The alternate hypothesis would be that life satisfaction scores of older and younger adults are different.

Page 53: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

53

4. What probability level did you choose and why? .05 - if one makes either a Type I or a Type II error, there will be no major risk involved.

5. What is your tcrit? tcrit = 2.1016. Is there a significant difference between the two groups? Yes, the tobs is in the tail. In

fact, even if one uses a probability level the t is still in the tail. Thus, we conclude that we are 99.9 percent sure that there is a significant difference between the two groups.

7. Interpret your answer. Older adults in this sample have significantly higher life satisfaction than younger adults (t = 4.257, p < .001). As this is a quasi-experiment, we can not make any statements concerning the cause of the difference.

8. If you have made an error, would it be a Type I or a Type II error? Explain your answer. If an error was made, it would have to be a Type I error; there really are no differences in life satisfaction between younger and older adults. We just got these results by chance.

Page 54: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

54

A researcher hypothesizes that electrical stimulation of the lateral habenula will result in a decrease in food intake (in this case, chocolate chips) in rats. Rats undergo stereotaxic surgery and an electrode is implanted in the right lateral habenula. Following a ten day recovery period, rats (kept at 80 percent body weight) are tested for the number of chocolate chips consumed during a 10 minute period of time both with and without electrical stimulation. The testing conditions are counter balanced. Compute the appropriate t-test for the data provided below.

Stimulation No Stimulation12 87 73 411 148 65 714 127 59 510 8

Mean = Mean =S = S =S2 = S2 =

1. What is your computed answer?2. What would be the null hypothesis in this study?3. What would be the alternate hypothesis?4. What probability level did you choose and why?5. What were your degrees of freedom?6. Is there a significant difference between the two testing conditions?7. Interpret your answer.8. If you have made an error, would it be a Type I or a Type II error? Explain your answer.

Page 55: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

55

Mean = 8.6 Mean = 7.6S = 3.306559138 S = 3.169297153S2 = 10.933333333 S2 = 10.044444444

Correlated t-test1. What is your computed answer? tobs = 1.3152. What would be the null hypothesis in this study? Electrical stimulation of the lateral

habenula has no impact on food intake; there will be no difference in the amount of chocolate chips consumed.

3. What would be the alternate hypothesis? Electrical stimulation of the lateral habenula will have an impact on food intake either increasing or decreasing the amount of chocolate chips consumed.

4. What probability level did you choose and why? .05 There is little risk involved if either a Type I or a Type II error is made.

5. What were your degrees of freedom? N-1 = 96. Is there a significant difference between the two testing conditions? There is no

significant difference between the amount of chocolate chips consumed. The tobs fall in the middle section of the t-distribution.

7. Interpret your answer. Electrical stimulation appears to have no impact on the amount of chocolate chips consumed by the rat (t=1.315, not significant).

8. If you have made an error, would it be a Type I or a Type II error? Explain your answer. If an error was made, it would have to be a Type II error as we found no differences. It may be that the lateral habenula does play a role in food intake but we failed to demonstrate it with this study/sample.

More resources:http://faculty.webster.edu/woolflm/ttest.htmlhttp://www.stat.ucla.edu/~nchristo/statistics13/stat13_confidence_intervals_answers.pdfhttps://www.stat.wisc.edu/~yandell/st571/handouts/disc7.pdf

Page 56: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

56

EX: Confidence IntervalYou want to rent an unfurnished one-bedroom apartment in Durham, NC next year. The mean monthly rent for a random sample of 60 apartments advertised on Craig’s List (a website that lists apartments for rent) is $1000. Assume a population standard deviation of $200. Construct a 95% confidence interval.

($1000 – 1.96*$200/sqrt(60), $1000 + 1.96*$200/sqrt(60)) ($949.39, $1050.61) We are 95% confident that the interval ($949.39, $1050.61) covers the true mean monthly rent of Durham apartments listed in Durham

EX: Confidence Interval, Sample SizeThe UCLA housing office wants to estimate the mean monthly rent for studios around the campus. A random sample of size n = 31 studios is taken from the area around UCLA. The sample mean is found to be x = $1300 with sample standard deviation s = $150. Assume that these data are selected from a normal distribution. a. Construct a 90% confidence interval for the mean monthly rent of studios in the area around UCLA. Answer: Using the t distribution with 30 degrees of freedom x¯ ± t α 2 ;n−1 s √ n ⇒ 1300 ± 1.697 150 √ 31 ⇒ 1300 ± 45.7 ⇒ 1254.3 ≤ µ ≤ 1345.7. b. Construct a 99% confidence interval for the mean monthly rent of studios in the area around UCLA. Answer: x¯ ± t α 2 ;n−1 s √ n ⇒ 1300 ± 2.750 150 √ 31 ⇒ 1300 ± 74.1 ⇒ 1225.9 ≤ µ ≤ 1374.1. c. What sample size is needed so that the length of the interval is $60 with 95% confi- dence? Assume σ = $150.

Page 57: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

57

EX: Sample Size

5. (10 marks) Suppose a consumer advocacy group would like to conduct a survey to find the proportion p of consumers who bought the newest generation of an MP3 player were happy with their purchase. a) How large a sample n should they take to estimate p with 2% margin of error and 90% confidence? b) The advocacy group took a random sample of 1000 consumers who recently purchased this MP3 player and found that 400 were happy with their purchase. Find a 95% confidence interval for p.

Page 58: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

58

EX: Type I/II ErrorsIt has been shown many times that on a certain memory test, recognition is substantially better than recall. However, the probability value for the data from your sample was .12, so you were unable to reject the null hypothesis that recall and recognition produce the same results. What type of error did you make? Type I Type IIIn the population, there is no difference between men and women on a certain test. However, you found a difference in your sample. The probability value for the data was .03, so you rejected the null hypothesis. What type of error did you make? Type I Type IIBeta, β, is the probability of which kind of error?Type IType IIAs the alpha level gets lower, which error rate also gets lower?Type IType IIIf the null hypothesis is in reality false, which kind of error is not possible?Type IType II

Page 59: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

59

It has been shown many times that on a certain memory test, recognition is substantially better than recall. However, the probability value for the data from your sample was .12, so you were unable to reject the null hypothesis that recall and recognition produce the same results. What type of error did you make?

 Type I: That's not quite right. A type I error can only occur if the significance test results in a p value small enough to reject the null hypothesis, because a type I error is when you reject the null hypothesis when in fact it is true.

 Type II: That's correct. In this example, there is an actual difference in the population between recognition and recall, but you did not find a significant difference in your sample. Failing to reject a false null hypothesis is a type II error.

In the population, there is no difference between men and women on a certain test. However, you found a difference in your sample. The probability value for the data was .03, so you rejected the null hypothesis. What type of error did you make?

 Type I: That's correct. There is no difference in the population, but you found a difference in your sample. A Type I error occurs when a significance test results in the rejection of a true null hypothesis.

 Type II: That's not quite right. A type II error can only occur if the significance test fails to reject the null hypothesis. A type II error is related to power. Sometimes it's helpful to note that a type II error would occur when a study fails to reject the null hypothesis when there was not enough power to find the TRUE difference.

Beta, β, is the probability of which kind of error?

Type I: That's not quite right. Note that β is the second letter in the Greek alphabet, suggesting it is the probability of the second kind of error, type II.

Type II That's correct. The probability of a type II error is called beta, β. The probability of correctly rejecting a false null hypothesis equals 1- β and is called power.

As the alpha level gets lower, which error rate also gets lower?

Type I: That's correct. The type I error rate is affected by the alpha level; the lower the alpha level is, the lower the type I error rate gets. Alpha is the probability of a type I error given that the null hypothesis is true. When you specify an alpha level ahead of time, you are specifying the level of chance of a Type I error which you are willing to allow.

Type II: That's not quite right. Alpha is the probability of rejecting the null hypothesis when in fact it is true.

If the null hypothesis is in reality false, which kind of error is not possible?

Type I: That's correct. A Type I error occurs when a significance test results in the rejection of a TRUE null hypothesis.

Type II/l That's not quite right. A type II error can only occur when the null hypothesis is false. Remember that a type II error is not really an error, but rather reflects an inconclusive result, a lack of power to detect the true alternate state of the population.

Page 60: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

60

STATA: CH 1

STATA: CH 2

Page 61: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

61

STATA CH 3

Page 62: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

62

STATA: CH 4

Page 63: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

63

STATA CH 5

Page 64: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

64

050

100

150

0 10 20 30 0 10 20 30

1. male 2. female

Freq

uenc

y

www hours per weekGraphs by respondents sex

Page 65: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

65

STATA CH 6

Page 66: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

66

STATA CH 7

Page 67: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

67

Page 68: nataliemcormier.files.wordpress.com  · Web view2017. 10. 15. · Objective: not influenced by personal feelings or opinions in considering or representing the facts. ... involve

68

More Resourceshttp://turner.faculty.swau.edu/mathematics/math241/materials/practice/?02http://www.stat.ufl.edu/~mripol/PracticeExams2023/PQ2.pdf