sampling en

Upload: nitin-tyagi

Post on 06-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Sampling En

    1/19

  • 8/3/2019 Sampling En

    2/19

    MaMaEuSch

    ManagementMathematics forEuropeanSchools

    http://www.mathematik.uni-

    kl.de/ mamaeusch

    Populationandsample. Samplingtechniques

    Paula Lagares BarreiroJustoPuertoAlbandoz

    MaMaEuSch

    ManagementMathematics forEuropean Schools94342- CP-1-2001- 1-DE -COMENIUS -C21

    Univ ers it y of Sev illeThis pro ject has b een carried out with the partial supp ort of the Europ ean Comm unit y in the frame-

    w orkof the Sokratesprogram me.The con ten tdo esnot neces sarilyreflectthe p osit ionof the Europ e anComm unit y , nor do es it in v olv e an y resp onsibi lit y on the part of the Europ ean C omm unit y .

  • 8/3/2019 Sampling En

    3/19

    Con te n ts

    1 Population and sample. Sampling techniques 21.1 Reas ons to use sampling.P reviou s considerati ons. . . . . . . . . . . . . . . . . . . . 21.2 Sam p lin g tec h niques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Random sampli ng with and without re placemen t. . . . . . . . . . . . . . . . . . . . 51.4 Stratifi ed s ampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5 Clus ter s ampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6 Sys te matic sampli ng. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    1.7 Other s ampl ing tec hnique s .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2 An example of the application of sampling techniques 12

    1

  • 8/3/2019 Sampling En

    4/19

    Chapt er 1

    P opulati on and sampl e.Sam pli ngtec hni ques

    Let us e xtendin thisc hapterwhatw eh a v ealre adypresen tedin theb e ginningof De scriptiv eStatistic s,inc ludingno wthe definitionof somesampli ngtec hniqu esandconceptsin orde rto b eable to dec ide wh ic h is the appropriate sampling tec hnique for eac h situation .

    Let us imagi ng, for ins tance , that y our c lass has b een c hosen as a sample of a p opulation.Thestudy that is goi ng to b e mad e can b e ab out dieren t them es, for example:

    1. Th e opinion ab out the p ossibilit y of organizing alternativ e activities in y ou r c it y and a propof the activiti es that can b e mad e.

    2. A p oll ab o ut the opin ion on th e d ieren t p olitic leaders.

    3. Th e opinion ab out ab out th e p ossib le c hoice s for a end-of y ear-trip with the stud en ts of y class.

    Do y ou think that y our class w ould b e a go o d sample for an y of the se situations?The answ er isthat, for instance, for the s econd situation, the stud en ts of a class are not an appropri ate sample.F or the fi rst situation , w e ma y think that the s tuden ts of a class c an gi v e us in te r e sting i nformation,though ma yb e the sample can b e to o small and w e cou ld ha v e a lac k of infor ma tion (b o ys agirls of other ages, living in die ren t quar ters ,. ..), while for the thi rd situation, the sample can b ev ery usefu l.The refore , it is v ery imp ortan t th e c hoice of an appropriate s ampling tec hnique wh icass u res us that w e are c ho osin g a go o d s ample for t he study w e w an t to make.

    1.1 Reasons to use sampling.

    Previo us considerationsLet us imagine that w e are going to mak e stud ies to get the foll o wing information:

    The p erce n tage of Sp anis h p op ulation that h as acces s to in ternet.

    The a v erage las ting of a c onc rete trade of bat te r ies.

    2

  • 8/3/2019 Sampling En

    5/19

    F or the firs t case , the p opulation y ou w ou ld ha v e to ask to is b igger than 40 million p eople It isob vious that making an in tervi ew to more than 40 million p e ople require s a big e ort in man y fi e

    Firs t of all, the re is a big nee d of time, and s econd, of money , b ecau se it is nec ess ary to emplo yp e ople to mak e the in tervie ws, pa y thei r trips to let the m go to ev ery village,etc .

    Moreo v e r, there isan additional di cult y:it is complicated to get to eac h and ev ery Spaniard, b ecaus e wh en w e mak ethe in terview s, the re will b e p eople in hospitals, in a t rip t o a foreign coun try , etc.In this situation,

    for economic reas on s, it wi ll b e con v e n ien t t o in terview a cert ain part of the p opulati on, a sac hosen in an appropriate w a y so that w e can obt ain later c onc lusions for the whole p opulation.

    In the secon d situation w e h a v e a d i eren t dic u lty .

    If w e w an t to kno w the l asting of acertainbattery ,w eha v eto useit un tilit i so v er.Th e re f ore,s omeho ww edes tro ythisele men tof the

    p opulation.I f w e w ould ha v e to try eac h and ev ery b attery of the p opulation, w e w ould k eep noof them.Th us,what w e should do in thissituation is al s o to c ho ose an ap pr opr iate s ample andthe n w e could tak e the approp riate gene ral conclu sions.

    Due to the reason s w e ha v e j ust men tion ed, it is c on v e nien t in man y instanc es to usesamples .But if w e w an t to get really go o d conclusion s from the m, w e need to assure that w e mak e a rig

    c hoice ofour samples.F orinstanc e,for thec aseof thein ternetac cess inS pai n,if w ec h o ose10p e opleout of the 40millionof inhabitan ts,thi sis c learlynot enou gh,it is not a repre sen tativ esample.I wil l als o not b e repre sen tativ e if w e c ho ose 100 p eople f rom Madrid, or c h o osing all friendsand y our family .There are sometop icswhi c h shou ld b e cl early defin ed once w e w an t to

    sample:

    1. Th e selec tion meth o d for the elem en ts of the p opulat ion (sam p ling metho d to b e us ed).

    2. Sam p le size .

    3. Reliabilit ydegreeof thec onc lusionsth atw ec anobtain,thi sis, an e stimationof theerrorthat w e are going to h a v e (in terms of probabilit y).

    As w e ha v e ju s t said, a non ap pr opr iate sele ction of the eleme n ts of the samp le can cause fuerrors once w e w an t to es ti mate the corresp on din g p arame ters i n the p opulation.B u t w e can findsome more dieren t t yp es of errors:the in tervie w e r can b e partial,this is,he c a n pr om ote s omeans w e rs more than others .It c an als o happ en that the p erson w e are going to in tervi ew d o es notw an tto ans w er c ertainques tions (or can not answ er).W e c lassify all thes e p ossi ble err ors in thefollo wing w a y:

    1. Selection error: if an y of the e le men ts of the p opulation has a higher probabilit y of b ei ngselec te d than the rest.Let us imagi ne that w e w an t to meas ure ho w s atis fi e d the clien ts

    ofa gymnasium ar e , and for that, w e are going to in terview s ome of them from 10 to 12 in t hemorni ng.Thi s means that the p e ople who go to t he gymnasiu m in the af terno on will n ot b erepre sen ted, and then the s ample will not b e represe n tativ e of all th e c lien ts .A w a y to a v oidthis kind of err ors is c ho osing the s ample so t hat all the clien ts ha v e the s ame p robab ilit yb e ing s elec ted.

    2. Non-answer error: it is als o p ossible that som e of the elemen ts of the p opulation do notw an t or cannot a nsw er c ertain qu e stion s.Or it can also happ en, when w e ha v e a qu es tionnaireinc ludingp e rs onalquestion s ,that s omeof the me m b ersof thep opulationdo not answ e r

    hone stly .This e r rorsaregenerallyv e ryc omplicate dto a v oid,b utin casethat w ew an tto

    3

  • 8/3/2019 Sampling En

    6/19

    c hec khones t yin answ ers ,w ecan inc ludesomequestion s(fi lterques tions)to detec tif t heansw e rs are h ones t.

    After wh atw eha v esee nun tilno w,w ecan sa ythat w eh a v ea b ias edsamplewhenit is notrepre sen tativ e for the p opul ation .

    1.2 Sampling tec hniq ues

    W e ha v e already s tre ssed the imp ortance of a righ t c hoic e for the ele me n ts of the sam p le s mak e it repre sen tativ e of our p opulation but, ho w c an w e classify th e dieren t w a ys of c ho osisample?w e can s a y that the re are three t yp es of sampling:

    1. Probabilit y s ampling:it is the one in whic h eac h sample has the same p robab ilit y of b eingc hosen.

    2. Pur p osiv esampling:it is theonein whic hthep ers onwhois selectingthe samplei swh otrie s to mak e the sample represen tat iv e, dep ending on his opinion or purp ose, th us b eing trepre sen tation sub je ctiv e.

    3. No-rule samp lin g:w e tak e a sample without an y rule, b eing the sample re p res en tativ e if thep opulation is homogeneous and w e ha v e no sele ction bias.

    W e will alw a ys mak e probabilit y s ampling, b ecause in c ase w e c ho ose the approp riate tec hnit assuresus that thesam p leis repres en tativ eandw ecanestimatetheerrorsfor thesamp ling.

    There are d i eren t t yp es of probabilit y sampling:

    Rand om sampling with and without re placemen t.

    Stratifie d sampli ng.

    Cluster s ampling.

    Sys tematic sampling.

    Othe r t yp es of sam p ling te c hniques.

    Let us imagineno wtha tw eha v ealreadysele cteda samp le.F roma high sc ho olwith 560stude n ts,

    w eha v es electeda sampleof 28stu den tsto kno wif theyha v ei n ternetcon nec ti onathome.But, what d o es it m ean to s elec t 28 out of

    560?Whic h p rop ortion of the p opulation are w e

    selec ting?And wh en w e w an t to ha v e conclus ions ab out the p opulation, ho w man y of the studeof the p opulation do es e ac h one of the s ample e le men ts repre sen t?

    T o calculate the prop ort ion of stu den ts that w e are in tervi ewing, w e divide the s amp le size thep opulat ions ize,this i s:28/ 560= 0.05,andthisme ansthat w em ak ethep ollt o5%of thep opulation.

    No w w e are goi ng to calcul ate ho w man y stude n ts re p res en ts eac h one of theelemen ts of t hesample.W e mak e the othe r quotien t, no w w e divide th e n um b e r of el eme n ts of the p opulatiothe n um b er of e le men ts of the s ampl e:560/ 28= 20, wh ic h w ould me an that e ac h of the studen ts

    of the sample re presen ts 20 st ude n ts of the high s c ho ol.The t w o con cepts that w e ha v e j ust pre sen t ed ha v e the follo wing formal definition:

    4

  • 8/3/2019 Sampling En

    7/19

    1. Elevation factor: it is the quotien t b et w een the size of th e p opulation and the size of thesample,Nn . It repre sen ts the n u m b e r of el eme n ts existing in the p opulation for eac h e lemof the sam p le.

    2. Sampling factor: it is t hequotien tb e t w e enthe sizeof the samp leand the s iz eof t hep opulation,nN . If thi s quoti en t is m ultiplie d b y 100, w e get the p erce n tage of the p opulatio

    repre sen ted in the s ample .

    1.3 Random sampling wi th and wit hout replace me ntW eha v ealread ymen tione dthat if w ew an tto samp lein suc ha w a ythat the samplew eget is

    repre sen tativ e, w e should c ho ose a probabilistic sampling tec hnique.Ho w will y ou d o to sele ct 28stude n t s out of 560 in a high sc ho ol to get that all of the m ha v e the s ame p robabilit y of b eingthe sample?The easies t thing w ould b e to mak e a dra w to c ho os e 28 of the m, th is

    is,to c ho ose

    the m randomly , so that the y all ha v e the s ame p ossib ilit y of b elongin g to t he sample.This s elec tionpr o cesscorresp ondsto a randoms amplin g.W ewill sa ythat w ear emaking

    random s ampling wh en the pro ces s, th rough whic h w e c ho ose the s a mple, guaran tee s that ap os sible samples that w e can tak e from the p opulation ha v e th e same probabi lit y of b eing c h this is, all the eleme n ts of the p opulation ha v e the s ame probabilit y of b e ing c hose n to b e lonthe s ample.

    Whe na certaine le men tis se lecte dand w eha v eme asure dthev ari ablesneede din a certainstudy and it can b e selecte d again, w e sa y that w e mak e s ampl ing with replacem en t.This samplingtec hnique is usually call ed simpl e ran dom sampling.

    In the case that the eleme n t cannot b e s elec ted again af ter b ein g sele cte d once , w e s a y tha v e obtain e d the sample through a ran dom s ampl ing without replacem en t.

    In ou rexample,whenw earegoingto s elec tt hesampl eout of the560stude n ts

    of t heh ighsc ho ol, if w e are going to ask ab out the fact that they ha v e in ternet con nec tio n at h ome or notnot in teresting for us to ask t wic e the same p erson, s o onc e w e c h o ose an eleme n t of the p opw e dont w an t to c ho ose itagain.

    So w e w ould mak e random s ampling with out repl aceme n t.Thou ghtheset w ometho d sarediere n t,whenthes izeof the p opulationis infin ite,or it is

    so b igthat w ecancon siderthat it is infinite,b othmetho dswill leadus to similarconclu sion s .

    Nev ertheles s,if thes ampling fractionn/N is greaterthan0.1 (w es ampl emorethan10% ofthep opulation) the d iere nce b et w een the conclus ions w e get ma y b e imp ortan t.Whe n w e as k in our exam ple if the stude n ts ha v e in terne t c onnec tion at home or not, w e a

    in te r e ste d not only in the n u m b er of st uden ts ha ving the con nection but also in the prop ortioit repre sen ts in the high sc ho ol.Thes e t w o v alues and the a v erage in some other cases (for instance ,when w e ask ab out the he igh t of the stud en ts), are the param eters c alculate d more ofte n and tone s w e usuall y w an t toestimate .

    In the cas e of random samp ling, with and without repl aceme n t,the se estimator s ha v e the follo wing expre ssions:

    T otal:

    bX = N

    n

    X

    i=1

    X i

    n.

    Av erage:

    bX =

    n

    Xi=1

    X i

    n .

    5

  • 8/3/2019 Sampling En

    8/19

    P rop ortion :

    bP =

    n

    X

    i=1

    Pi

    n.

    The prop ortion w ould b e the a v e rage of a v ariable that only c an b e zero or one.In the expre ssionsab o v e :

    X i is the v alu e of th e v ariable w e ar e s tudy ing.N is the size of the p opulation.n is the size of the samp le.Pi is a v ariable that tak es v alues 0 or 1.

    The estimation of the error f or th ese estimator s w ould b e:Total:F or s ampl ing with replac eme n t:

    bV( bX ) = N2S

    2

    n.

    F or s ampl ing without repl acem en t:

    bV( bX ) = N2A

    n S

    N

    )2

    n

    .

    Average:F or s ampl ing with replaceme n t:

    bV( bX ) =S2

    n.

    F or s ampl ing without repl acem en t:

    bV( bX ) = An S

    N)

    2

    n.

    Proportion:F or s ampl ing with replaceme n t:

    bV( bP ) =b bP Q

    n 1.

    F or s ampl ing without repl acem en t:

    bV( bP ) = An

    N) b b

    P Q

    n 1.

    1.4 Strati fied sampling

    Let us imagin e n o w that w e w an t to mak e a p oll to kno w what d o p eople in y our c it y do in thirspare time.W e all kno w that anci en t p e ople do not h a v e the sam e ac tiv ities than middle-age p as y our pare n t s, f or inst ance.W e w ould b e in te reste d in getting that all the information, that w e

    alre ady kno w, can help us to find a more represen tativ e sample.In fact, w e ar e in tere sted in gettingthat all thes e groups are represe n ted in our sampl e .Th ese groups that ha v e b een define d (in our

    6

  • 8/3/2019 Sampling En

    9/19

    example, b y ages ) w e will call the m strata.What w e will d o no w is t o div ide our s ample in s u c haw a y that w e ha v e elemen ts of all the

    strata.Let us defin e the w a y w e sampl e in th iscase.Let usc ons iderthat w eh a v eou r p opulation ofs izeN divide d in tok subp opulationsof siz es

    N,N ,...,N

    1 2 k. Thes e subp opulat ions ar e disjoin t and v e rify thatN1 +N2 + + Nk = N. Eac hof thesub p o pulationsis calle dstratus. If w ew an tto ha v ea sampleof n elemen tsof theinitialp opulation, w e sele ct a sample of s iz eni so thatn1 + n2 + + nk = n.

    Whic h adv an tages and disadv an tages presen ts stratified sampling?W e pr e sen t t hem no w:Adv an tages:

    W e can ha v e more p reci se information inside the subp opulations ab out the v ariables w e arstud ying.

    W e c an raise pre cision of the estimator s of the v ariables of the whole p op ulation.

    Disadv an t age s:

    The c hoic e of the size of th e samp les insid e eac h stratu s to let the sample size b en.

    It ma y b e dicul t in s ome p op ulations to divide in to strata.

    As a general th in g, st ratifie d s ampling pr o vides b etter results than the random s amplin g whthe s trata are more dieren t am ong t hem and more homogeneous i n ternal ly .

    W e c an c ons ide r 3 me tho ds to dis tribute th e si ze of the s ample am ong the strata.

    1. Prop ortionally to the s ize of eac h stratus , i.e., if w e tak ethe

    j -th s tratus with size

    Nj , andthe n a s ample of this stratu s will ha v e size

    n(N /Nj ), b eingN the s ize of t he p opulation andn the s iz e of the sample.

    2. Prop ortionallyto the v ar iabilit yof theparamete rw eare c ons ideringin e ac hstratu s.F orins tance, if w e kno w that the v ar iance for the h e igh t in the male s tuden ts is 15 cm and forthe female studen ts is 5 cm, the prop ortion of the male studen ts to fem ale stud en ts is 3 to 1and the sample s h ould k eep that p rop ortion .

    3. W e as sign the same size to eac h st ratu s .As a c ons eque nce w e p romote the smaller strata and

    the c on trary happ ens with the bigger ones in te rms of prec is ion.

    F or the case of stratifi ed s ampling, the main estimator s are the follo wing:Total:

    bX =

    k

    X

    h=1

    N X .h h

    Average:

    bX =

    k

    X

    h=1

    w Xh h =

    k

    X

    h=1

    Nh

    Nx .h

    Proportion:

    7

  • 8/3/2019 Sampling En

    10/19

    bP =

    k

    X

    h=1

    wP ,

    h bh

    whereX h is t he sample a v erage for v ariab le X in stra tush.

    Nh is the size of stratus h.N is the size of the p opulation.nh is the sample s ize in str atush.n is the sample s ize.

    bPh is t he sample p rop orti on of the v ariable in s tratush,and the estimation f or the error w e mak e when w e estimate the p op ulation p arame ters is :

    Total:

    bV( bX ) =

    k

    X

    h=1

    N 2hA fh) bS2hnh

    ,

    with

    fh = nhNh

    y bS2h = nh

    nh 1

    "1nh

    nh

    Xi=1

    X 2hi xh#

    .

    Average:

    bV( bX ) =

    k

    X

    h=1

    w2hA fh) bS2hnh

    ,

    wherewh, fh y S2h are the same as b e fore.

    Proportion:

    bV( bP ) =

    k

    X

    h=1

    w2hA fh) b

    P Qh bh

    nh 1,

    wherebQh = 1 bP .h

    1.5 Cluster sampling

    W e think n o w ab out making a p oll to stu dy the a v e rage h e igh t of the s t ude n ts of h igh sc ho ols our cit y .Inste ad of sampling am ong eac h of the st ude n ts of the cit y , w e c ould c onsider the p oss ib

    c ho osing some quarte rs b ecause referring to the h e igh t, quarters ar e lik e small p opulations w e can compare to the cit y .In this case, can w e simpl ify th e c hoice of the sample so that w e c ho osequarters withou t lo osing accuracy?The answ er is th at in this case , w e could c ho ose quarters andanalyzethe h eigh twithou tlo osingaccuracy .Let us pres en tthe s amplingmetho dwhic hallo wsthat.

    In cluster s ampling, p opulation is divid ed in to units or groups , called s trata (usually they areunits or areas in whic h the p opulati on h as b een divided i n), w h ic h should b e as represen tativ e

    8

  • 8/3/2019 Sampling En

    11/19

    p os sible for the p opulation, i.e., they shou ld represe n t the heterogeneit y of the p opulati on w e astudyin g and they should b e homogeneous among them.

    The reason to mak e thi s sam p ling is that s ometime s it is to o e xp ensiv e t o mak e a completof all the eleme n ts of the p opulation that w e w an t to study , or that when w e fi nish making thelistit ma y ha v e no sens e to mak e thestudy .The maindisadv an tagethat w ema yha v eis that if theclus te rsarenot homoge neou samon g

    the m, the final sample ma y n ot b e re p resen tativ e of the p opulation.If w e supp ose that the clu sters ar e as het eroge neou s as the p opulation, refe rring to the v ari

    w e are con siderin g, and that the cl usters are homogeneous am ong them, then to get a sam p le wonly ha v e to c ho ose s ome clus ters.W e s a y that w e mak e cluster sampl ing in one stage .

    This s ampling meth o d has the adv an t age that it s implifies the c ollecting of the s ample informtion.

    Let us see no w the expre ssions of the estimators for this s amplin g tec h niq ue:Total:

    bX = MP ni=1 X iP ni=1 M i

    .

    Average:

    bX =P ni=1 X iP ni=1 M i

    .

    Proportion:

    bP =P ni=1 A iP ni=1 M i

    ,

    where

    bX i is the total of v ar iab leX in clu steri.

    bX i is t he sample a v erage of v ariableX in clus teri.N is the n um b er of cl usters of the p opulation.M is t he size of the p opulation.

    n is the n um b er of c luste rs of the sample.M i is the size of clusteri.Ai is the tot al of v ariableA, whic h tak es v alues 0 or 1 in cluste ri,

    and the estimation of the errors w e mak e when w e e stimate th rough these e xpres sions are:Total:

    bV( bX ) =N(N n)

    n

    1

    n 1

    n

    X

    i=1

    (X i XM i )2.

    Average:

    bV( bX ) =N(N n)

    M n21

    n 1

    n

    X

    i=1

    (X i XM i )2.

    Proportion:

    9

  • 8/3/2019 Sampling En

    12/19

    bV( bP ) =N(N n)

    M n21

    n 1

    n

    X

    i=1

    (Pi PM i )2.

    1.6 System ati c sam

    plingW e can think ab out a dieren t w a y of sam p lin g.Let u s imagin e that in y our high s c ho ol and w eha v edecid edto c ho ose28p e ople.In this case,thee lev ationfactorw ouldb e560/28= 20. W en um b er stud en ts from 1 to 560.W e t hen c ho ose a n u m b erx randomly from 1 to 20 and this w ouldb e the first stu den t selected.

    Th e n, w e select n u m b erx+20,x +220and so on.It is not a randomsampling b ecaus e all the s amples are not equal ly pr obab le.Let us d e fi ne this sampl ing tec hn ique.

    Let us supp ose that w e ha v e a p opulation ofN elemen ts or dered an d n um b e re d from 1 toN ,and w e w an t t o get a s ampl ewith

    n elemen ts.This p opulation can b e divi ded inn subsets , eac h ofthe m withv = Nn ele men ts, i.e., eac h subs et has as man y el eme n ts as the elev ati on factor indica

    W e rando mly c ho ose a n u m b e red e le me n t from 1, 2 un t ilNn and w e call itx ,0 and the n w e tak ethe follo wi ng e le me n ts:x0 +v,x0 + 2v,x0 + 3v,x0 +4v,..

    .In case th atv is not a natu ral n um b er, w e clear to the c loser on e (lo w er) , so ma yb e some s ama y ha v e sizen 1. This fact b rings a sm all p ertu rbation in the theory of system atic sampl ing,

    that w e do n ot ha v e to tak e in to accoun t,if

    n> 50.This t yp e of sampli ng ne eds that w e ha v e previous ly c h e c k e d that the orde re d ele men ts pno p erio d icit y in the v ariables w e w an t to stu dy , b ec ause if w e can find p e rio dicit y and it isloseto v aluev, the results that w e obtain w ould ha v e a big bias and w ould not b e v

    alid.S ys tematicsamplingis equ iv alen tto ran domsampli ngif the elem en tsaren um b e redin anrandom w a y .

    Adv an tages of thi s metho d are:

    1. Extends the s ampl e to all the p opulation.

    2. It is v ery eas y to applyit.

    Disadv an t age s of the metho d are:

    1.In cre ase of the v arianc e if there is p erio dicit y in the n u m b e ring of the e le me n ts, ap p ebias due to selec t ion.

    2. Problems wh en w e w an t to e stimate th e v arianc e.

    W e can con sider an instan ce of clus te r s ampling, ha ving eac h c luste r the follo wing ele me npresen t b y their n um b er in the list:

    Fir s t cluster:1, 1 +v, 1 + 2v, 1 + 3v, 1 + 4v,...Second cluster:2, 2 +v, 2 + 2v, 2 + 3v, 2 + 4v,..

    .. . .v-th cluste r :v, 2v, 3v, 4v,...

    nv.Selec ting a system atic sam p le is equiv alen t to s elec t randomly only one clu ster.T o do so, it isnecess ar y that eac h of the clusters has a similar s tructur e to the p opulation.

    W ecanals oconsidersystematicsamplingas a particularcas eof str atifie dsamplingwith nstrata, eac h of them withv ele men ts, s o that w e c ho ose onl y one elemen t of eac h stratu s.

    10

  • 8/3/2019 Sampling En

    13/19

    In strati fied sampl ing the s elec te d elemen t is random , while in this tec hnique w e c ho ose ranthe firs t ele men t and the re st are d e termine d b y factorv.

    The e stimators for this t yp e of sampling are:Total:

    bX = v

    n

    X

    i=1

    X .i

    Average:

    bX =1

    n

    n

    X

    i=1

    X .i

    Proportion:

    bP =1

    n

    n

    X

    i=1

    P ,i

    whereP is a v ari able taking v alue s 0 or 1.

    1.7 Other sampling tec hniquesTw o-stage s amplin g is a particular case of clus ter s ampl ing in whic h in the s econd s tage w e d o

    select all the elemen ts of the cluster, but s ome eleme n ts c h osen in a random w a y .Clust ers in thefirst stage ar e c alled primar y units an d the ones in the se cond stage are secondary units.

    Multistage sampling is a generalization of the previous tec h niq ue, so that eac h cluster c an b e agroup os clus ters and so on in e ac h st age .

    In general, to mak e c ompl icated stud ies con cep ts of stratifying, clu sters and random s amplinare used.F or instance , the p opulation of a coun try can b e d ivid ed in to clusters (pro vinces , cities

    quarters) that can b e heterogeneous insid e (for ins tance, refe rring to c ons u m p tion) bu t homogeamong them .Afterw ards it is nece ssar y to divide th ese units in hom ogeneous strata (primary units ,for i nstance , quarte rs).Eac h of the se units is divided in to new units (buildings) c alled secondary

    units, whic h are divided in to flats (hou ses).W e w oul d c ho ose our sample in the follo wing w a y:

    1. W e select a stratified sampl e .W e w ould tak e at least one stratus (one quarter).

    2. W e c ho ose ran domly some bu ildi ngs of eac h of the s elec ted quarters.

    3. W e tak e random ly one or sev eral h ouse s of e ac h of the bu ildings selec ted.

    11

  • 8/3/2019 Sampling En

    14/19

    Chapt er 2

    An exampl e o f the appl icat ion ofsam pli ng tec hniq ues

    W e ha v e de cide d to mak e a study in a high s c ho ol.W e w an t to ha v e data ab out the n u m b e r of le fthanded st ude n ts, the n um b er of stu den ts who ha v e in te rnet conn e ction at h ome , the heigh

    stude n t s and t he p o c k e t money they receiv e weekly .The use fulne ss of kno wing the n um b er of left handed s tuden ts of a high s c ho ol is e asy to uderstan d,b ecausethehighsc h o ols houldha v ean appropriateequi pmen tfor them,for instan ceadapted c h airs.

    In tern etconne ctionat homeis an imp ortan tinformati on.It canb euse dnot onl yto c hec kw e the r it is p oss ible to oer some material for the s tuden ts throu gh the in ternet, bu t also to knoif they access to s ome other didactic information a v ailable on the w eb.

    The stu dy of heigh t is clas sical.It is an yw a y in teresting to kno w if h e igh t is c hanging with yearsand the p opulati on is getting taller.

    P o c k etmone yis a so ci alrelev an tdata. It is alsoin tere stingto kno who wm uc hmon eyt hestude n t s deal with , and it is also in terest ing to kno w ho w they s p end it to unde r s tand what thydev ote their time to.

    Onc e w e ha v e fixe d what w e w an t to get, w e d e cide t o sample to get t he conclusion s ab oallthe stud e n ts of the high s c ho ol without asking e ac h of th em.The information a v ailable for us isthe one referred to the distribution of stud en ts in y ears and class es:

    A B C D E T otal1st y ear 33 20 532nd y ear 20 15 30 653rd y e ar 20 15 26 14 754th y ear 27 27 25 795th y ear 33 28 30 31 23 145

    2th y ear 30 34 32 31 127

    S o w e are w orking with a p opulation of 544 stu den ts of a h igh sc ho ol.

    W estart p osin gth atw earegoingto usea s ampl es iz eof around60 stud e nts, whic his t hemaxim um allo w ed an d that w e think that ma y b e e n ough for the st udy w e are going to mak e

    12

  • 8/3/2019 Sampling En

    15/19

    W e c an ge t th en the first information, our samp ling fraction w ould b e:

    f =n

    N=

    60

    544= 0.1102,

    this is, w e ar e goin g to sampl e appro xi mately 11% of the p opulation.W e can also c al c ulate t heele v ation factor whic h w ould b e:

    E =N

    n=

    544

    60= 9.1,

    or equi v alen tly , eac h studen t in tervie w e d represen ts 9 colleagues .No w w e ha v e to decide wh ic h metho d w e w an t to us e to sample th e d i eren t c haracte riseare going to s t udy .Le t us denote them i n the follo wing w a y:

    X will represen t the heigh t.

    Y will repre sen t the p o c k e t money .

    Z will repre sen t v ariab le b e ing le ft handed, whic h will tak e v alu e 1 if a stude n t if left hand 0 if he/s h e is n ot left handed.

    I will repr esen tv ar iableha vingi n ternetc onnec tionat home whic hwill tak ev alu e1 inarmativ e case and 0 in n e gativ e case.

    W e will mak e a diere n c e in to 2 cases of the 4 v ariables.Th e fi rst thing, w e mak e to our selv es aquestion :w e h a v e our p opulation divided in to gr oups and lev els , can w e consid er that this divishas an influence in an y of these v ariables?This i s, can w e c ons ider that in eac h l ev el, for instance ,the a v erage h e igh t can c hange?The an sw er to this ques tion is that it is logic to think t hat i t

    willc hange.A p riori, w e c an supp ose that the age h as an imp ortan t influence for the heigh t.And forthe p o c k e t mone y?Then the age is also imp or tan t, b ecau se w e all could get more money from ourpare n ts wh ile get ting older.Do es it happ en the same for b e ing left h ande d ?Th e n , t he answ er is n ob e cause i f y ou ar e left hande d , t his happ ens from the da y y ou w ere morn, so age has no influon this. An d th e same appli es for the f ac t of ha ving in ternet conne ction at home.So w e c ho osedieren t sam p lin g tec hniques for these t w o case s.

    Case I: variables pocket money and heightw e ha v e alread y men tioned that w e ha v e the p opulation divid ed i n to lev els and groups.F or us ,

    the division in le v els is a division in tostr atab ecause the lev e ls are h omoge neous ins ide the m w ithresp ec t to the age (an d w e c an also think that it happ ens t he same for the p o c k et money and heigh t), an d a s w e ha v e said b e fore, age has a big influe nce on these v ariables and it mak es sthat w e are in terested in ha ving all these s trata re presen ted in our sampl e .So w e c ho ose for t hesecas esr andom st r atifie d sampl ing.

    The next thing to b e d one is to dec ide the sample size ins ide eac h strat a.W e ha v e 6 strata with the follo wing siz es:

    13

  • 8/3/2019 Sampling En

    16/19

    Stratus Siz e

    1st l ev el (str atus 1)N1 = 532nd lev el (stratus 2)N2 = 65

    3rd lev el (stratu s 3)N3 = 754th l ev el (str atus 4)N4 = 79

    5th l ev el (str atus 5)N5 = 1 45

    6th l ev el (str atus 6)N6 = 1 27

    The usual thing in this situation is to use samp le size in the s trata prop or tiona l to their siz e, sothat the siz es of the s ampl es k eep the s ame prop ortion than the siz es of the strata.W e c a lculatethe n the size of the sample in eac h stratus through the follo wing e xpression:

    ni = n Ni

    N,

    and w e g e t the follo wing s ample s iz es:n1 = 60

    53544 = 5.84s o w e tak en1 = 6,

    n2 = 6065544 = 7.16s o w e tak en2 = 8,

    n3 = 6075544 = 8.27s o w e tak en3 = 8,

    n4 = 6079544 = 8.71s o w e tak en4 = 8,

    n5 = 60145544 = 15.99s o w e tak en5 = 1 6,n6 = 60

    127544 = 14.00s o w e tak en6 = 1 4,

    where the clearing h as b e en made t o k eep the sample s ize 60 that w e had p osed.So w e ha v ethe sample size s th at w e needed and w e can mak e random sampling inside eac h str atus, to selethe n um b er of studen ts th at w e ha v e alread y dec ide d .

    Our data are the follo wing:for the hei gh t w e got:

    St ratu s 1165 161 153 150 151153St ratu s 2157 161 168 162 165 171 169164St ratu s 3168 165 175 175 165 163 165165St ratu s 4164 171 177 163 170 165 160175St ratu s 5175 173 161 158 175 164 158 161 158 171 175 170 187 168 170185

    St ratu s 6190 178 194 183 165 170 176 173 168 183 173 183 174177

    and for the p o c k et mone y:

    St ratu s 110 0 3.5 0 03St ratu s 2 0 5 0 15 0 3 20St ratu s 3 5 8 8 0 20 5 1010St ratu s 412 6 5 12 12 6 00St ratu s 5 5 10 12 15 10 12 30 12 30 10 6 5 10 21 4015St ratu s 612 10 9 6 8 9.4 15 0 20 10 15 10 00

    W e no w pr o ce ed to the esti m ations T h e first thing to do is to c alculate the a v erage in th e whic hgiv esus inf ormationab outthe b eha viorof thev ariab lesin thes trata.Lateron,w ewillcalculate the a v e rage of t he heigh t and the p o c k et money of the studen ts of the high sc ho ol a

    14

  • 8/3/2019 Sampling En

    17/19

    will gi v e it together with an estimation of the error w e get when w e mak e s uc h an e stimation .W emak e the pro c ess indep end e n tly for eac h of the v ariables:

    F or the h eigh t w e ha v e:

    Stratus Av erage Std deviation

    1 x1 = 155.5 S2x1 = 36.7

    2 x2 = 164.625 S2

    x2 = 21.41073 x3 = 167.625 S

    2x3 = 22.5535

    4 x4 = 168.125 S2x4 = 36.6964

    5 x5 = 1 69.3125 S2x5 = 81.6958

    6 x6 = 177.642857 S2x6 = 67.478

    W ecandirectlyseethat s omethingcurious.The a v erageis increasingas the lev elincreas es.This leads us to thi nk that the c hoice of strati fied s ampling h as b e en righ t in this case.

    W e c alculate no w the same for p o c k et mon ey:

    Stratus Av erage St d deviation1 y1 = 2.75 S

    2y1 = 4.026

    2 y2 = 3.125 S2y2 = 26.4107

    3 y3 = 8.25 S2

    y3 = 33.35714 y4 = 6.625 S2y4 = 25.4107

    5 y5 = 15.1875 S2y5 = 101.2291

    6 y6 = 8.8857 S2y6 = 35.229

    No ww ecalculat ethee stimateda v eragefromthecompletesampleandthe estimationof theerror in terms of the estimati on of the v arianc e for t he 2 v ar iable s w e are s tudying.F or the heigh t:

    bX =

    6

    X

    h=1

    w xh h =

    6

    X

    h=1

    Nh

    Nxh =

    53

    544155.5+

    65

    544164.625+

    75

    544167.625+

    79

    544168.125

    +145

    544

    169.3125+127

    544

    177.642857= 168.9463.

    The e xpression for the v ariance is

    bV( bX ) =

    k

    X

    h=1

    w2hA fh) bS2hnh

    ,

    and in our case w e ha v e:

    Stratus wh w2h fh 1 fh

    1 53544 = 0.095 0.009653 = 0.1132 0.8868

    2 65544 = 0.1194 0.014865 = 0.123 0.8769

    3 75544 = 0.1344 0.018875 = 0.1066 0.8934

    4 79544 = 0.1415 0.02879 = 0.1012 0.8988

    5

    145

    544 = 0.2598 0.0675

    16

    145 = 0.1103 0.88976 127544 = 0.2276 0.051814127 = 0.1102 0.8898

    15

  • 8/3/2019 Sampling En

    18/19

    No w w e subs ti tute thes e n um b ers in the pre vious exp ression and w e g et:

    bV( bX ) =

    k

    X

    h=1

    w2hA fh) b

    S2hnh

    = 0.0090.886836.7

    6+0.0140.8769

    21.4107

    8+0.0180.8934

    22.5535

    8

    +0.020.898836.6964

    8+0.06750.8897

    81.6958

    16+ 0.05180.8898

    64.478

    14= 0.728.

    S o in the case of the h eigh t w e already ha v e our es timations.The e stimated a v erage heigh t is168.9463and w e c alculate that w e ha v e an e rror of 0.728.

    No w w e mak e the same calculat ion for the p o c k e t money .W e s tart b y calculat ing the estim a teda v erage :

    bY =

    6

    X

    h=1

    w yh h =

    6

    X

    h=1

    Nh

    Nyh =

    53

    5442.75+

    65

    5443.125+

    75

    5448.25+

    79

    5446.625

    +145

    544

    15.1875+127

    544

    8.8857= 8.8633.

    The es ti mati on of the v ariance c an b e c alculated directly b ecaus e w e ha v e t he s ame v aluwh andfh:

    bV(bY ) =

    k

    X

    h=1

    w2hA fh) bS2hnh

    = 0.0090.88684.026

    6+0.0140.8769

    26.4107

    8+0.0180.8934

    33.3571

    8

    +0.020.898825.4107

    8+0.06750.8897

    101.2291

    16+ 0.05180.8898

    35.229

    14= 0.666.

    Case II: Variables being left handed and having internet connection at homeNo w w e w an t to stud y v ariables b e ing left handed and ha ving in ternet connection a t hom

    It is eas y to see that the d ivis ion in to str ata is not u s eful i n this case, s o w e s h ould th ink ab ousingsomeothe rsampl ingtec hn ique.W es till w an

    tto ge ta s ampleof around60stude n

    ts.W ecould

    think that, with re sp ectto thesev ariable s,thegroupsthat thep opul ationis dividedin b eha v elik esmallp opulations,i. e., w ecancons iderthat thegroupsb eha v elik ethe wh olehighs c ho ol.More o v er it i s in tere sting for us the p ossibilit y of sampling some groups b ecause sele cting a rasample of stu den ts, finding th em and in terviewing the m is not an e asy task.

    But no w,whataregroupsfor us? W eh a v ealreadysaidthat insidethem,theyb eha v elik esm allp opulationswith resp e ctto ourv ariables ,whilethe groupsares imilaramongthem. Thisme ans that w e ha v e the p opulati on divi ded in to clusters, so w e will apply c luste r sam p lin g tothissituation.

    The n ext t hin g to b e done is t he n um b er of groups to b e sampl e d .W e kno w that the groupsdo n ot ha v e the same size, but 2 or 3 group s w ould ass u re u s a s ample of around 60 s tuden tsT o

    a v oid the p oss ibilit y of ha ving a sampl e of 2 sm all groups and the n getting a to o sm all sampleour purp o s es, w e decide to se lect 3 gr oups fr om the high s c ho ol.

    16

  • 8/3/2019 Sampling En

    19/19

    S o the data w e ha v e got are the follo wing.F or the v ar iable b eing le ft h ande d:Cluster 1:1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ,Cluster 2:0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ,Cluster 3:

    0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 ,where1 meansb eingleft handedand 0 meansb eingnot le fthande d .No w,for thev ariable

    ha ving in t erne t conn e ction at hom e, w e got:Cluster 1:1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 0 1 0 0 ,Cluster 2:1 1 1 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 0 ,Cluster 3:1 1 0 1 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 ,where no w 1 me ans ha ving in terne t conne ction at h om e and 0 means ha vi ng not.W e s tart no w estimatin g the total amoun t of left handed s tude n ts,and the prop or tion of lef t

    handed stud en t s, as w ell as the total amoun t of stude n ts ha ving in ternet c onnec tion at home athe prop or tion that this represe n ts in th e whole high s c h o ol.

    W e c alculate the total an d prop o rtion for eac h group and v ar iable :

    Left handed In terne t

    Clus ter T otal Prop ortion T otal Prop ortion1 3 0.15 10 0.5

    2 0 0 17 0.73913 2 0.08 20 0.8

    No ww ec ancalculat etheestim ationsf orthep rop ortionandtotal of v ariablesZ and I . W estart with v ariableZ:

    bZ = M P ni=1 bZiP ni=1 M i

    = 544P 3i=1 bZi

    P 3i=1 M i= 544

    3+0+2

    20+ 23+ 25= 544

    5

    68= 40,

    cPZ =P ni=1 AiP ni=1 M i

    =3+0+2

    20+23+25=

    5

    68= 0.0735,

    and w e d o the same for v ar iableI

    bI = M P ni=1 bI iP ni=1 M i

    = 544P 3i=1 bI i

    P 3i=1 M i= 544

    10+ 17+20

    20+ 23+25= 544

    47

    68= 376,

    cPI =P ni=1 AiP ni=1 M i

    =10+ 17+20

    20+ 23+25=

    47

    68= 0.6911.

    W e con tin ue no w estimating the error w e h a v e comm itted for the v ar iable b ein g left hand

    bV( bZ) =N(N n)

    n

    1

    n 1

    n

    Xi=1

    (Zi ZM i ) =21 3)

    3

    1

    2 C 0.073520)2

    + 0.073523)2

    + B 0.073525)2

    17