preliminary concepts

Click here to load reader

Upload: jelani-nolan

Post on 02-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Preliminary Concepts. Why Do We Need Statistics?. Statistics is about making decisions Consider the examples below - PowerPoint PPT Presentation

TRANSCRIPT

Preliminary Concepts

Preliminary ConceptsWhy Do We Need Statistics?Statistics is about making decisionsConsider the examples belowYou would like to know whether your English skills are good enough to take psychology courses in English. Should you focus on English practices, and take more courses on reading and writing?The simplest way is to take a test and see your score. Lets say it is 110. Is it a good score? You need to buy a simple calculator. How much will you pay for it? Is there any shop in your neighborhood that you can buy the same calculator at a lower price?Lets say there are sixteen shops in your town. How will you decide the cheapest one?Why Do We Need Statistics?Which is a better teaching technique: giving out the course notes and presentations or requiring students to take notes during class?You would like to know which goalkeeper performed better last season. You need to count each goalkeepers number of times that s/he stopped goals, saved penalty kicks and number of the games that s/he played.You hate SYM and university exam. You believe that LYS has nothing to say about a students future success in the university. That is, students GPA (Grade Point Average) cannot be predicted from their LYS score. How can you prove it?Why Do We Need Statistics?Even in everyday life, we need to decide in ambiguous conditions or under the conditions in which there is a huge amount of information. In such conditions, we need an effective tool to organize the information that we have. Statistics provides such a mathematical tool by which we can summarize the existent information and/or make predictions or inferences.

Why Do We Need Statistics?Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it thereby provides the navigation essential for controlling the course of scientific and societal advances (Davidian, M. and Louis, T. A., 10.1126/science.1218685).Statisticians apply statistical thinking and methods to a wide variety of scientific, social, and business endeavors in such areas as astronomy, biology, education, economics, engineering, genetics, marketing, medicine, psychology, public health, sports, among many. "The best thing about being a statistician is that you get to play in everyone else's backyard." (John Tukey, Bell Labs, Princeton University)

Basic ConceptsDescriptive and InferentialTwo kinds of statistics could be differentiatedDescriptive statistics (deduction) is the discipline of quantitatively describing the main features of a collection ofdata Suppose that you visited each shop in your town and checked the prices of the calculators.Basic ConceptsDescriptive StatisticsLets say you decided to buy Casio. Now we need to rearrange our table to see the lowest price for Casio

Basic ConceptsDescriptive StatisticsAs you can see, Bizim Sokaktaki tezgah offers the best price. Looking closely to the table, we can see other characteristics of the distribution. The most common price for Casio is 6 TL. Thirty three percent of the shops sell Casio for 6 TL (5/15*100=33.33). The highest price for Casio is 7 TL and twenty percent of the shops offers that price (3/15*100=20.00)Based on the present table, several deductions could be made. For instance, which shop offers the best price for Yumatu or in which shop, we can see the biggest difference between the prices of Sharp and Yumatu.

The main purpose of descriptive statistics is to organize and summarize the dataBy this way, we can describe several characteristics of our sample or population.Basic ConceptsInferential StatisticsInferential Statistics (induction) is aimed to make predictions based on the analysis of numeric data. Inferential statistics is about the probability. By the aid of the inferential statistics, we can see whether our predictions are better than chance.Basic ConceptsInferential StatisticsLets turn back to our example about your English Skills. When you get a certain score from a test (110 points for our example), at least three questions arises:Q1: Is this your true score?Q2: What is the meaning of your score?Q3: Can we take the score in this test as a predictor of prospective (future) success in Psychology courses?

Basic ConceptsInferential StatisticsQ1: Is this your true score? Were you tired when you took the test? Or, did the test cover the subjects that you are very familiar. Or were you simply lucky (lucky guess is an inevitable part of multiple choice tests).

Basic ConceptsInferential StatisticsOne way to see whether your score was affected by chance or other factors is to complete an identical test or the same test. Of course, you would learn the items if you took the same test. Lets say you find identical tests and completed ten of them.

Basic ConceptsInferential StatisticsSo, which of them is your true score? Should we accept the mean as your true score? But, you should note that you never got 100.45 and it seems not possible to take such a score.

So, what we need to do first is to find out (predict) your true score.Basic ConceptsInferential StatisticsQ2: What is the meaning of your score? What is the rage of scores, which could be taken from the test?Lets say the possible range for the scores is between 25 and 150. Is it enough to say your score is OK?What you need to decide is a reference point. If you find a way to compare your score with a special score, you can decide whether your English is good or bad. There could be two kinds of reference points

Basic ConceptsInferential StatisticsYou can ask your classmates to complete the same test and you simply evaluate your rank among their scores. Lets say, you are better than sixty percent of the classmates in that test. Shall we take that as an evidence of your superiority in English? Lets examine the table

Basic ConceptsInferential StatisticsAs you can see, %30 of your classmates got the highest scores. The difference between the higher score and your score is 36 point. The difference between the lower score and your score is 1 point.

Do you still think you proficiency is better than most of your classmates?Basic ConceptsInferential StatisticsAs a second way, you can compare your score with a national cut point(s). For instance, test developers might publish a chart to interpret the scores: 20-70 beginner, 71-90 intermediate, 91-110 upper-intermediate, 111-150 advancedYour original score was 110. That score is the upper limit for upper-intermediate. That is your score is at the edge of the border between upper-intermediate and advanced. According to the manual, you should be categorized as upper-intermediate. Do you agree with that?So, the second thing that we infer is whether your score significantly differs from a meaningful reference point.Basic ConceptsInferential Statistics3) Can we take the score in this test as a predictor of prospective (future) success in Psychology courses? On which bases?Lets say some famous Psychologist took the same test just beginning of the first semester in a UniversityBasic ConceptsInferential StatisticsAs it could be seen at the table, there is a relation between the test scores and GPA. As the proficiency scores increase, GPA increases. This pattern is called positive correlation. In the case of negative correlation, one score decreases as the other score increases. If the correlation (relation) between proficiency and GPA is strong enough, then we can infer your future success.

19

Basic ConceptsInferential StatisticsConsidering the table, we can see that your proficiency score is between Skinner and Freud. So, your GPA will be most probably between 78 and 82. Congratulations, you have the potential to become a better psychologist than Freud

So, the third thing that we need to infer is whether we can predict your GPA from your proficiency score.20Basic ConceptsInferential StatisticsIn sum, Descriptive Statistics is about describing certain characteristics of the sample or population. However, Inferential Statistics is about predicting certain characteristics of the population by evaluating the characteristics of the sample.

At this point, we need to define what is sample and populationBasic ConceptsPopulation and SampleA population can be defined as including all people or items with the characteristic one wishes to understand. Lets say, you believe that blonde girls are not that clever. In this example, all blonde girls in the world are your population. The characteristic that you are interested in is their level of intelligent.

Basic ConceptsPopulation and SampleTo clarify your hypothesis, you need to limit your population. So, are you also interested in the girls changed their hair color into blonde? Probably not. Then, you should restate your argument: Inherently blonde girls are not that clever. Once you define your population properly, you can start collecting data on the characteristic that you are interested.

Basic ConceptsPopulation and SampleSample is a subset of the population, which we can reach and collect data. Lets say you are realy eager to conduct a study on the level of intelligence of inherently blonde girls. Since, it is not possible to reach each blonde girl in the world, you need to find a subgroup and give them your IQ test.Basic ConceptsPopulation and SampleSampling is a vital issue for statistics and research methods. The main purpose of sampling is to reach the most representative subset of the population. If your sample is not representative, your findings will not be valid.

Basic ConceptsPopulation and SampleIn the 1936 American presidential election Roosevelt, a Democrat, was being challenged by Republican Alf Landon. One of the leading magazines of the day , Literary Digest, surveyed voter preferences by mailing questionnaires to 10 million people whose names were gathered from list of automobile and telephone owners. Over the two million people responded and the results indicated that Landon would beat Roosevelt by a landslide. Basic ConceptsPopulation and SampleIn fact, Roosevelt beat Landon by one of the largest margins ever. This was one of the largest surveys ever taken. How could it have been so wrong?The US was in the middle of Great Depression in 1936 and only a minority of people was financially secure enough to own a car or telephone. They tended to vote Republican. Most other Americans were worried about buying enough food to feed their families, and they tend to vote Democratic. Basic ConceptsPopulation and SampleTo ensure representativeness, inferential statistics require random sampling. By random sampling, we ensure that each possible sample of the same size has an equal probability of being selected from the population. For instance, suppose that we wish to select five person five persons random from our current statistics class. What we need to do is to write the name of each class member on a slip of paper, put those slips in a gallon jar, shake and tumble the contents of the jar well, and withdraw five slips from the lot.

Basic ConceptsVariables and ConstantsA variable is a characteristic that could take on different values. Considering our hypothesis about blondes, you can see that the variable that we are interested in is the level of intelligence. When we measure blonde girls intelligence, we can see that their scores are not identical. In fact, statistics is about variability. By the aid of the statistical techniques, we try to organize and understand the variability in nature.

Basic ConceptsVariables and ConstantsA constant is a characteristic which is identical for the each member or the sample. For instance, hair color and gender would be constant for our hypothetical study on the level of intelligence of blonde girls. Additionally, constants delimit applicability of our findings. Even if we observe an intelligence deficiency in blonde girls, it doesnt say anything about red-heads or blonde boys.

Scales of MeasurementMeasurement is the process of assigning numbers to observations. Lets discuss about how we measure the properties belowWeight of a box: a weighing machineLength of a table: a rulerBeauty of a competitor in a beauty contest: (?)Gender of a participant: (?)Success of a football team in the league: (?)Scales of MeasurementWhat about the meaning of the numbers that we assign. Are they same?If the weighing machine show zero, can we take that number as an indicator of no weight at all?What about the judge in the beauty contest? If he assign zero to a competitor, does it mean she has no beauty?Lets say Galatasaray won 30 games last year, and Fiskobirlik won 15 games. Can we say Galatasaray won twice as many games as Fiskobirlik?Lets say the rank of Galatasaray is 2 and of Fiskobirlik is 12. Does that mean Galatasarays rank of success is 6 times higher than Fiskobirliks?What about the beauty contest? If Aylin wins the contest and Jale gets the third, does it mean Aylin is three times more beautiful than Jale?Scales of MeasurementApparently, numbers have different meanings in these situations. To distinguish the different kinds of situations, we need to identify four kinds of measures.Nominal ScalesOrdinal ScalesInterval ScalesRatio ScalesScales of MeasurementNominal ScalesNominal scales are the simplest kinds of scales. Some variables are qualitative in their nature rather than quantitative. For instance, biological sex, types of cheese, brand names of the cell phones, etc.Numbers in nominal scales has no meaning rather than indication of differing categories.If we assign 1 to males and 2 to females, there is no implication that females more than male in some dimension. Scales of MeasurementNominal ScalesNominal Scales has only two reguirements:The categories have to be mutually exclusive: the observations can not fall into more than one categoryThe categories have to be exhaustive: there must be enough categories for all observationsExamplesMale and Female are mutually exclusive and exhaustive categories for biological sex. What about Gender (social sex). Some individuals in biological female category might feel much more like they are male. So, we need to include other categories like Gay, Lesbian, transsexual etc.Scales of MeasurementOrdinal ScalesA more complex scale than nominal onesThe categories must still be mutually exclusive and exhaustiveThey are also indicate the order of magnitude of some variableThe outcome of ordinal scales is a set of ranksSocio-economic Status: Low-Middle-HighCollege students: Freshman, Sophomores, Juniors, and SeniorsNumbers can be assigned to the categories, but that numbers has no meaning than the rank of numbers.Lets consider our example of SESIs the difference between Low and Middle equal to the difference between Middle and High?Scales of MeasurementInterval ScalesThe next major level of complexity is the interval scalesInterval Scales have all the properties that ordinal scales have. Additionally,The interval (distances) between scores has the same meaning anywhere on the scale. Examples:Level of depression on Beck Depression ScalePain temperature scales Celsius and Fahrenheit scalesScales of MeasurementInterval ScalesLets discuss about Celsius scaleThe difference between 10C and 20C is equal to 20C and 30C. That is, energy you need to increase heat of a certain amount of water from 10 to 20 is equal to the amount of energy for an increase from 20 to 30.What about 0C? Does it mean there is no heat?Scales of MeasurementRatio ScalesThe most complex and advanced scalesRatio scales posses all the properties of interval scales and in addition has a absolute zero pointGram for weight and centimeter for height are some examples.If something is zero grams, then it has no mass.If something is zero centimeters, then it has no length.Kelvin is a good example. Differing from Celsius and Fahrenheit, Kelvin has an absolute zero point. That is, at zero Kelvin substance would have no molecular motion (energy) and, therefore, no heat Why does absolute zero point matter?Imagine we want to measure the temperature of our classroom with a Celsius scale. Lets say it is 30C. One of our friends would say it was 15C last winter. So, does it mean it is now twice hotter than last winter? Why does absolute zero point matter?No it doesntSince, the zero point is not absolute in Celsius scale; we can move it up or down. Lets say we decided to move it 10C lower. Thus, our new Celsius Scale would show 40C for the current temperature, and 25 for last winter. So, it is not meaningful to assert that a temperature of 30C is twice hot as one of 15 or that a rise from 30C to 33C is a 10% increase.

Final notes about scalesThe ratio scale subsumes all other scalesRatio>Interval>Ordinal>NominalComputation with the scoresNominal scales: ClusteringOrdinal Scales: Clustering and rank orderInterval Scales: addition and subtractionRatio Scales: addition, subtraction, multiplication and divisionVariables and Computational AccuracyVariables may be either discrete (kesikli) or continuous (srekli).Discrete VariableThe variables which can take on only certain valuesFor instance, number of the students in our classroom is discrete. It is 43 this week, but it was 42 last week. But no value can be between these two. Continuous variables can take on any value. For instance, temperature can be 29C, 29.4C or 30CVariables and Computational AccuracyEven though a variable continuous in theory, the process of measurement always reduce it to a discrete one. Imagine, the true weight of a tomato were 0.23138 kilogram. A standard weighing machine is not that sensitive. It would measure weight to the nearest hundred of a kilogram. So, it would show 0.231Is that a problem?Variables and Computational AccuracyWithin the limits of recording equipment, it is up to the investigator to determine the degree of accuracy appropriate to the problem at handIf you want to buy a tomato, 0.00038 kilogram is not important. What if you would like to buy gold? 1 kg tomato is 1.20 TL. So, 0.00038 kg is 0.000456 TL1 g Gold is 101 TL. So, 0.00038 kg is 3.838 TLVariables and Computational AccuracyIn Psycohology, we also need to be very carefull in computational Accuracy.If a psychologist works on an theoretical construct which is not directly related to individuals wellbeing, accuracy will not be that importantOn a paper-pencil attitude measure, it will not be important if a participant rate his/her favorability toward an attitute object as 7 while his/her true attitude is 8What about intelligence, apptitude, or skills?