92465757 statistical techniques for analyzing quantitative data

Upload: rajib-mukherjee

Post on 04-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    1/41

    Statistical Techniques for

    Analyzing Quantitative Data

    Maryam RamezaniValues in Computer Technology

    CSC 426

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    2/41

    Outline

    Statistics in Research

    Exploring and Organizing a Data Set

    Nature of the Data , Nominal , Ordinal, Interval, Ratio

    Normal and Non-Normal Distributions

    Descriptive Statistics

    Inferential Statistics

    Statistical Software Packages

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    3/41

    Role of Statistics in Research

    With Statistics , we can summarize large bodies

    of data, make predictions about future trends

    ,and determine when different experimentaltreatments have led to significantly different

    outcomes.

    Statistics are among the most powerful tools in

    the research's toolbox.

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    4/41

    How statistics come to research?

    In quantitative research we use numbers to

    represent physical or nonphysical

    phenomena

    We use statistics to summarize and interpret

    numbers

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    5/41

    Exploring and Organizing a Data Set

    Look at your data and find the ways of organizingthem

    example: Scores of test for 11 children:

    What do you see?

    Ruth 96, Robert 60, chuck 68, Margaret 88

    Tom 56, Mary 92,Ralph 64, Bill 72,Alice 80

    Adam 76,Kathy 84

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    6/41

    Exploring and Organizing a Data Set

    Student Score

    Ruth 96

    Robert 60

    Chuck 68

    Margaret 88

    Tom 56

    Mary 92

    Ralph 64

    Bill 72

    Alice 80Adam 76

    Kathy 84

    Student Score

    Adam 76

    Alice 80

    Bill 72

    Chuck 68

    Kathy 84

    Margaret 88

    Mary 92

    Ralph 64

    Robert 60

    Ruth 96

    Tom 56

    Student Score

    Alice 80

    Kathy 84

    Margaret 88

    Mary 92

    Ruth 96

    Alphabetical

    Order

    Adam 76

    Bill 72

    Chuck 68

    Ralph 64

    Robert 60

    Tom 56

    0

    20

    40

    60

    80

    100

    120

    0 5 10 15

    Series1

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    7/41

    Using Computer Spreadsheets to Organize

    and Analyze Data

    Sorting

    Graphing

    Formulas What Ifs

    Save, Store, recall, update information

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    8/41

    Functions of Statistics

    Descriptive Statistics:

    describes what the data look like

    Inferential Statistics :

    inference about a large population by collecting

    small samples.

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    9/41

    Considering the Nature of the Data

    Continuous or discrete

    Nominal, ordinal, interval or ratio scale

    Normal or non-normal distribution

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    10/41

    Continuous versus Discrete Variables

    Continuous Data :takes on any value within a finite or infinite interval.You can count, order and measure continuous data.

    Example :height, weight, temperature, the amount of sugar in an orange, the

    time required to run a mile.

    Discrete Data : values / observations belong are distinct and separate,i.e. they can be counted (1,2,3,....).

    Example: the number of kittens in a litter; the number of patients in a doctors

    surgery; the number of flaws in one metre of cloth; gender (male, female); blood

    group (O, A, B, AB).

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    11/41

    Nominal Data

    the numbers are simply labels. You can count but not order or

    measure nominal data

    Example: males could be coded as 0, females as 1; marital status of anindividual could be coded as Y if married, N if single.

    classification data, e.g. m/f

    no ordering, e.g. it makes no sense to state that M > F

    arbitrary labels, e.g., m/f, 0/1, etc

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    12/41

    Ordinal Data

    ordered but differences between values are not

    important e.g., Like scales, rank on a scale of 1..5 your degree of satisfaction

    rating of 2 rather than 1 might be much less than the difference inenjoyment expressed by giving a rating of 4 rather than 3.

    You can count and order, but not measure, ordinal

    data.

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    13/41

    Interval Data

    ordered, constant scale, but no natural zero

    differences make sense, but ratios do not

    e.g.: 30-20=20-10, but 20/10 is not twice as hot!

    e.g.: Dates: the time interval between the starts of years 1981 and 1982

    is the same as that between 1983 and 1984, namely 365 days. The

    zero point, year 1 AD, is arbitrary; time did not begin then

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    14/41

    Ratio Data

    Like interval data but has true zero

    Ordered, Constant scale, natural zero

    e.g., height, weight, age, length

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    15/41

    Normal and Non-Normal Distributions

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    16/41

    Normal Distribution

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    17/41

    Non-Normal Distributions

    Skewed to the Left

    (Negatively Skewed)Skewed to the Right

    (Positevely Skewed)

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    18/41

    Leptokurtic and Platykurtic

    Distributions

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    19/41

    Descriptive Statistics

    Descriptive Statistics describes data

    Points of Central Tendency

    Amount of Variability

    Relation of different variables to eachother

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    20/41

    Measuring center: If the n observations are x1, x2,,

    xn, arithmetic mean is

    n

    xxx

    xn

    21

    Points Of Central Tendency: Mean

    Geometric Mean

    e.x.: Biological growth, Population growth

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    21/41

    Measure of Central Tendency

    Mode The Most frequently occurringscore is identified.

    Data on nominal, ordinal,

    interval and ratio

    Median The midpoint of the data Ordinal, interval, and ratio

    Arithmetic

    mean

    All scores are added and the sum

    is divided by the number of scores

    Interval and ration

    Geometric

    mean

    All scores are multiplied together,

    and the nth root of their product is

    computed.

    Ratio scales

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    22/41

    Measures of Variability

    How great is the Spread?

    Range=Highest Score-Lowest score

    the quartiles: The pth percentile of a distribution is the value

    such that p percent of the observations fall at or below it.

    The 50th percentile = median, M

    The 25th percentile = first quartile, Q1

    The 75th percentile = third quartile, Q3

    Interquartile: Quartile 3- Quartile 1

    Example:

    13 13 16 19 21 21 23 23 24 26 26 27 27 27 28 28 30 30

    M=?, Q1=?, Q3=?

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    23/41

    Measures of Variability

    2

    11

    1

    n

    i

    i xx

    n

    s

    Standard Devastation

    standardized score

    xz

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    24/41

    Measure of Relationship: Correlation

    correlation indicates the strength and direction of a linearrelationship between two variables.

    See page 266 for other examples or correlation statistics

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    25/41

    Notes about Correlation

    Substantial correlations between two

    characteristics needs reasonable Validity and

    Reliability in measuring

    Correlation does not indicate causation

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    26/41

    Examples of using Statistics in

    Computer Science Conceptual Representation of User Transactions or Sessions

    A B C D E F

    user0 15 5 0 0 0 185

    user1 0 0 32 4 0 0

    user2 12 0 0 56 236 0

    user3 9 47 0 0 0 134

    user4 0 0 23 15 0 0user5 17 0 0 157 69 0

    user6 24 89 0 0 0 354

    user7 0 0 78 27 0 0

    user8 7 0 45 20 127 0

    user9 0 38 57 0 0 15

    Session/user

    data

    Pageview/objects

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    27/41

    Inferential Statistics

    We use the samples as estimate of population parameter.

    The quality of all statistical analysis depends on the quality of

    the sample data

    Sample

    Population

    Random Sampling: every unit in the

    population has an equal chance to be

    Chosen

    A random sample should represent thepopulation well, so sample statistics

    from a random sample should provide

    reasonable estimates of population parameters

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    28/41

    Some definitions

    Parameter: describes a population

    Statistic: describes a sample

    Sample statistics Population parameter

    Sample mean x

    Sample proportion p P

    Sample variance s2 2

    Sample number n N

    A parameter is a characteristic or quality of a population that in concept is

    constant ,however, its value is variable.

    example: radius is a parameter in a circle

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    29/41

    Inferential Statistics

    Estimate a population parameter from a

    random sample

    Test statistically hypotheses

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    30/41

    Inferential Statistics: Estimate a

    Population Parameter from Sample

    All sample statistics have some error in estimating population parameters

    Example: estimate mean height of 10 year old boys in Chicago, Sample:200 boys

    How close the sample mean is to the population mean?

    we dont know but we know:

    The mean from an infinite number of samples form a normal distribution.

    The population mean equals the average (mean) of all samples.

    The Standard deviation of sample distribution ( standard error) is directly

    related to the std of the characteristic in question for the overallpopulation.

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    31/41

    Standard Error

    Standard error tell us how much the particular mean vary from one

    sample to another when all samples are the same size and drawn

    randomly from the sample population.

    Standard Error:

    n is size of all samples and is the population std which we dont have!

    We use the std of sample:

    nM

    1

    n

    s

    M

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    32/41

    Accuracy of the Estimator

    As in many problems, there

    is a trade off between

    accuracy and dollars.

    What we will get from

    our money if we invest

    dollars in obtaining a largersize?

    n = 100?

    n = 200?

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    33/41

    Point versus Interval Estimate

    A point estimate is a single value--a point--taken from a sample and used to estimate thecorresponding parameter of a population

    , s, s2 and r estimate , , 2, respectively

    An interval estimate is a range of values--an interval within whose limits apopulation parameter probably lies.

    we say that we are 95% confident that the unknown population mean lies in the interval

    X

    (x -2/(n1/2),x+2 /(n1/2))

    95% confidence interval for .

    In only 5% of all samples,

    the sample mean x is not in the above interval,

    that is 5% of all samples give inaccurate results.

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    34/41

    Testing Hypothesis

    Confidence intervals are used when the goal of our analysis is toestimate an unknown parameter in the population.

    A second goal of a statistical analysis is to verify some claim aboutthe population on the basis of the data.

    Research Hypothesis =/=Statistical hypothesis

    A test of significance is a procedure to assess the truth about ahypothesis using the observed data. The results of the test areexpressed in terms of a probability that measures how well the datasupport the hypothesis.

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    35/41

    Sample values: The sample average of nicotine = 1.51 mlg

    The standard deviation = 1.016.

    The estimated amount of nicotine is 1.51mlg, based on the sample values.

    The standard error of the sample average is

    S.E.=s.d./sqrt(n-1)=0.045

    Is there an actual difference between the sample value (1.51mlg) and theadvertised value (1.4 mlg)? Or is it just due to sampling error?

    To answer this question we need aTest of Significance:

    Example

    To determine whether the mean nicotine content of a brand of cigarettes is

    greater than the advertised value of 1.4 milligrams, a health advocacy grouptakes a sample of 500 cigarettes and measures the amount of nicotine in the

    sample.

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    36/41

    Stating an hypotheses

    The null hypothesis H0 expresses the idea that the observed difference is

    due to chance. It is a statement of no effect or no difference, and is

    expressed in terms of the population parameter.

    Let denote the true average amount of nicotine.H0 : =1.4mlg

    The alternative hypothesisHarepresents the idea that the difference is real. It

    is expressed as the statement we hope or suspect is true instead of the null

    hypothesis.

    The alternative hypothesis states that the cigarettes contain a higher

    amount of nicotine, that is: Ha : > 14mlg

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    37/41

    General comments on stating hypotheses

    It is not easy to state the null and the alternative hypothesis!

    The hypotheses are statements on the population values.

    The alternative hypothesis Ha is often called researcher hypothesis,

    because it is the hypothesis we are interested about.

    A significance test is a test against the null hypothesis

    Often we set Ha first and then Ho is defined as the opposite

    statement!

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    38/41

    Errors in Hypothesis testing

    Type I Error: the null hypothesis is rejected when it is in fact true; that is,

    H0 is wrongly rejected.

    Type II Error:the null hypothesis H0, is not rejected when it is in fact false

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    39/41

    Meta- Analysis

    Meta-analysis refers to the analysis of analyses...the statistical

    analysis of a large collection of analysis results from individual

    studies for the purpose of integrating the findings. (Glass, 1976, p. 3)

    Conduct a fairly extensive search for relevant studies

    Identify appropriate studies to include in meta-analysis

    Convert each studys results to a common statistical index

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    40/41

    Using Statistical Software Packages

    SPSS

    SAS

    Matlab Statistics toolbox

    SYSTAT, Minitab, Stat View, Statistica

  • 7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data

    41/41

    Interpreting the Data

    Relating the findings to the original research problem and to the

    specific research questions and hypothesis

    Relating the findings to preexisting literature, concepts, theories and

    research results.

    Determining whether the findings have practical significance as well

    as statistical significance

    Identifying limitations of the study