week 1 intro to statistics

Upload: enrica-melissa-panjaitan

Post on 04-Feb-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/21/2019 Week 1 Intro to Statistics

    1/31

    Week 1: Introduction to

    Statistics & Data Analysis

    Arfka Nurhudatiana

  • 7/21/2019 Week 1 Intro to Statistics

    2/31

    Outline

    Concepts: Descriptive statistics vs. Inerential

    statistics

    Sa!ple vs. "opulation

    Sa!plin# "rocedure

    $ualitative vs. $uantitative varia%les

    Discrete vs. Continuous varia%les Sa!ple ean' edian' (an#e' )ariance'

    Standard Deviation

  • 7/21/2019 Week 1 Intro to Statistics

    3/31

    A *li!pse o otivation

  • 7/21/2019 Week 1 Intro to Statistics

    4/31

    What is Statistics+

    ,ranch o mathematics' dealin# -ith thecollection and analysis of data' leadin# tostatistical inference

    key-ords: !athe!atics' data analysis'statistical inerence

    Statistical inerence: deducinga #eneralconclusion %ased on collected sa!ples. Acceptin#/re0ectin# a hypothesis

    Derivin# esti!ates

  • 7/21/2019 Week 1 Intro to Statistics

    5/31

    2a!ple: !anuacturin# industry

    In !anuacturin# industry'it is co!!on to have theollo-in# roles:

    "rocess en#ineers: !onitor

    the di3erent processesinvolved' 1 en#ineer 1 process

    "roduct en#ineers: !onitorthe output' 1 en#ineer 1product

    $uality assuranceen#ineer:peror! statisticalinvesti#ation' e.#.' usin#ANO)A

    4he result o the investi#ation allo-s the co!pany to deter!ine

    necessary !odifcations in order to keep the process at a desired level oualit .

  • 7/21/2019 Week 1 Intro to Statistics

    6/31

    )ariations in Data

    4-o sourceso variations: )ariation over ti!e/space: %et-een the value o%served at one point

    o ti!e -ith another point o ti!e

    )ariation in !easure!ent: %et-een the value o%served and the truevalue

    Ideally' I the o%served values in a process -ere always the same and -ere

    always on target' there -ould %e no need or statistical !ethod.

    I in one %atch o ther!o!eters produced' the ther!o!eters 6used onthe sa!e person at the sa!e ti!e -ith the sa!e environ!ent condition7

    always gave the same value and the value -as accurate (correct),no statistical analysis to evaluate the products is needed.

    8o-ever' Our data tend to have variations and thus -e need to use statistical

    !ethods to guarantee(estimate as close as possible) the actualvalue6s7 o our data.

  • 7/21/2019 Week 1 Intro to Statistics

    7/31

    Descriptive vs. Inerential Statistics

    Descriptive statistics: statistics -hich helpdescri%e or characteri9e the nature o the dataset.4hey provide si!ple su!!aries a%out the sa!pleand the !easures. Example:a student has co!pleted 1;; SC

  • 7/21/2019 Week 1 Intro to Statistics

    8/31

    Descriptive vs. Inerential Statistics

  • 7/21/2019 Week 1 Intro to Statistics

    9/31

    Sa!ple vs. "opulation

    !opulation:collections oall individualite!s o aparticular type.

    "amples:collection oo%servationstaken ro! apopulation.

    Example:Durin# the ca!pai#n period' ca!pai#n !ana#ers conducted a surveyto understand the conditions o the voters. 4he population-as

    Indonesian citi9ens -ho have the ri#ht to vote in the presidentialelection >;1?. 4hesamples -ere certain nu!%ers o Indonesianciti9ens located in various re#ions in Indonesia -ith di3erent a#es'#enders' and occupations.

    "opulation and sa!plescan also %e students in ,inus International'trees in the orest' fsh in the sea' !anuacturin# products' etc.

  • 7/21/2019 Week 1 Intro to Statistics

    10/31

    2a!ple Scenario:

    Consider a !arket researcher or a sot drink co!pany -ho!i#ht -ant to deter!ine the s-eetness preerences oA!ericans %et-een the a#es o 1@ and >@.

    O%viously' #atherin# data ro! every individual in this

    population -ould %e nearly i!possi%le and prohi%itivelye2pensive.

    It -ould %e !ore practical to collect data ro! a su%set'or sa!ple' o the population.

    I the sa!ple is un%iased' the sa!ple data can %e usedto !ake inferencesa%out the population.

  • 7/21/2019 Week 1 Intro to Statistics

    11/31

    Sa!plin# "rocedure

    !opulation: every A!erican in a#e #roup 1@ to>@."ample: +In order or a sa!ple to %e un%iased' it !ust %e:17 representative o the population ulfl the a%ove criteria: A!erican -ith a#e %et-een 1@

    and >@>7 rando!ly selected

    every%ody in the population has e5ual chance to %eselected as sa!ple

    I the sa!ple is only ro! 1 city or 1 school' the

    conclusion is only applica%le to that city/school and anarro-er a#e #roup

    7 suciently lar#e I the research only involved respondents' -ould you

    trust the result+ Why+

    s!all sa!ple is sensitive to %ias 61 -ron# ans-erlar#ely a3ects the fnal result7

  • 7/21/2019 Week 1 Intro to Statistics

    12/31

    )arious Sa!plin# "rocedures

  • 7/21/2019 Week 1 Intro to Statistics

    13/31

    Sa!plin# "rocedure#andom "election:

    17 Si!ple (ando! Sa!plin# any particular sa!ple has the sa!e chance o %ein#

    selected as any other sa!ple.

    >7 Stratifed (ando! Sa!plin#

  • 7/21/2019 Week 1 Intro to Statistics

    14/31

  • 7/21/2019 Week 1 Intro to Statistics

    15/31

    Sa!plin# "rocedure%onandom "election:

    17 Convenience Sa!plin# / Accidental Sa!plin# the units that are selected or inclusion in the sa!ple are

    the easiest to access 2a!ple: the frst @; respondents 6%ut %e careul' !ost o

    the! !ay co!e ro! IS pro#ra!7

    >7 Syste!atic Sa!plin# the researcher frst rando!ly picks the frst ite! or

    su%0ect ro! the population. 4hen' the researcher -illselect each nthsu%0ect ro! the list.

    7 "urposive Sa!plin# / Eud#!ental Sa!plin#

  • 7/21/2019 Week 1 Intro to Statistics

    16/31

    'uantitative variables: !easures o values or counts and aree2pressed as nu!%ers 6can %e discrete' can %e continuous7.2a!ple: ho- !any children do you have+ 8o- oten do you #oshoppin#+

    'ualitative (categorical) variables: !easures o FtypesF and !ay%e represented %y a na!e' sy!%ol' or a nu!%er code2a!ple: -hich !a0or do you study+ What is your occupation+$ualitative varia%les can %e nominal6no order/rankin# se5uence7or ordinal 6has order' e.#.' like' neutral' dislike7

    $ualitative vs. $uantitative )aria%les

  • 7/21/2019 Week 1 Intro to Statistics

    17/31

    Discrete vs. Continuous )aria%les

    Discrete variables are countablein a fnite a!ount o ti!e.2a!ple: the nu!%er o students in a classroo!. 6%ilan#an %ulat7 6int7

    ontinuous variables are usually o%tained %y measuring*

    Example: len#th' -ei#ht' and ti!e. Since continuous varia%les arereal nu!%ers' -e usually round the!. 6dou%le7

  • 7/21/2019 Week 1 Intro to Statistics

    18/31

    easures o Gocation: 4heSa!ple ean and edian

    easures o location are desi#ned to provide the analyst -ith so!e5uantitativevalues o -here the centre' or some other location' o data islocated.

    17 ean: avera#e value

  • 7/21/2019 Week 1 Intro to Statistics

    19/31

    easures o Gocation: 4heSa!ple ean and edian

    >7 edian

    4he purpose o the sa!ple !edian is to reHect the central tendency o thesa!ple in such a -ay that it is uninHuenced %y e2tre!e values or outliers.

    edian and !ean can %e 5uite di3erent.

  • 7/21/2019 Week 1 Intro to Statistics

    20/31

    2a!ple

    4-o sa!ples o 1; northern red oak seedlin#s -ere planted in a

    #reenhouse' one containin# seedlin#s treated with nitrogenand the other containin# seedlin#s with no nitrogen. All otherenviron!ental conditions -ere held constant. All seedlin#scontained the un#us Pisolithus tinctorus.

    4he ste! -ei#hts in #ra!s -ere recorded ater 1?; days.

    (nitrogen) ?

    (no nitrogen) ?

    x

    x

    =

    =

    (nitrogen) ?

    (no nitrogen) ?

    x

    x

    =

    =

    Which one has healthier ste! 6hi#her ste! -ei#ht

    eannitro#en ;.@@

    edian ;'@

  • 7/21/2019 Week 1 Intro to Statistics

    21/31

  • 7/21/2019 Week 1 Intro to Statistics

    22/31

  • 7/21/2019 Week 1 Intro to Statistics

    23/31

    Other !easures o locations:A trimmed meanis co!puted %y Jtri!!in# a-ayK a certain percent o%oth the lar#est and the s!allest set o values.Lor e2a!ple' the 1; tri!!ed !ean is ound %y eli!inatin# the lar#est

    1; and s!allest 1; and co!putin# the avera#e o the re!ainin#values.

    ,eore tri!!in#:What do you o%served+- ean sli#htly chan#es'

    !edian does not chan#e.- ean #ives !ore detailed

    inor!ation 6!ore sensitiveto variations7.

  • 7/21/2019 Week 1 Intro to Statistics

    24/31

    "ample mean vs* !opulation mean

    What -e 0ust calculated is sa!ple / population !ean+

    Sa!ple !ean #ives inco!plete inor!ation 6it is true or thecollected sa!ple only' %ut not or the real population7.

    8o-ever' %y collectin# the ri#ht sa!ples' -e e2pect the sa!ple

    !ean to %e as near as possible to the population !ean.

    4hereore' in the uture chapters' the sa!ple !ean is calculatedas an esti!ated o the population !ean.

  • 7/21/2019 Week 1 Intro to Statistics

    25/31

    easures o )aria%ility: 4he Sa!ple (an#e'Standard Deviation' and )ariance

    Data "et +: ' @' M' 1;' 1;Data "et : M' M' M' M' M

    What is the !ean and !edian o the a%ove data

    set+

    ,ut -e kno- that the t-o data sets are not identical

  • 7/21/2019 Week 1 Intro to Statistics

    26/31

    8o- data points di3er ro! the !ean can %e !easured usin#variance' standard deviation' and ran#e.

    -he variance' standard deviation' andrangeare %asically

    !easures o spread.

    easures o )aria%ility: 4he Sa!ple (an#e'Standard Deviation' and )ariance

    Sa!ple ran#e B

    ( )2

    2

    1

    x Xsn

    =

    Lor variance

    nB1 de#rees o reedo!

    ( )2

    1

    x Xs

    n

    =

    Lor standard deviation

    4he avera#e o the s5uareddeviations a%out the !ean is called $: Why squared? Why n-1?

  • 7/21/2019 Week 1 Intro to Statistics

    27/31

    I not s5uared' the nu!erator al-ays e5uals ;' %ecausethe ne#ative deviations a%out the !ean al-ays cancel outthe positive deviations a%out the !ean.

    ( )2

    2

    1

    x Xs

    n

    =

    Lor variance

    nB1 de#rees o reedo!

    Why nB1+ ,ecause the last value o is deter!ined %y the initial n 1 o the!.I nis very lar#e' nB1 %eco!es unsi#nifcant.

    ( )2

    1

    x Xs

    n

    =

    Lor standarddeviation

  • 7/21/2019 Week 1 Intro to Statistics

    28/31

    No Nitro#en=s standard deviation ;.;M> #ra!

    Nitro#en=s standard deviation ;.1M #ra!

    Conclusion: 4he #roup -ith Nitro#en has a lar#er variance andthe #roup -ithout nitro#en tends to %e !ore consistent.

  • 7/21/2019 Week 1 Intro to Statistics

    29/31

    Su!!ary

    Concepts: Descriptive statistics vs. Inerential

    statistics

    Sa!ple vs. "opulation Sa!plin# "rocedure: rando! & nonrando!

    $ualitative vs. $uantitative varia%les

    Discrete vs. Continuous varia%les

    Sa!ple ean' edian' (an#e' )ariance'Standard Deviation

  • 7/21/2019 Week 1 Intro to Statistics

    30/31

    2ercise

  • 7/21/2019 Week 1 Intro to Statistics

    31/31

    Sa!ple si9e: 1@

    ean: .M

    edian: . 4ri!!ed: >.@' >.' >. and @.' @.>'

    ?.

    4ri!!ed !ean: . )ariance: ;.P?

    Std: ;.PM