a note to teachers - life sciences rlhs · williamson and jen pfannerstill, paul k. strode, valerie...

Post on 24-Mar-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

ANotetoTeachersThisstatisticsprimerwasdesignedwithmystudentsinmind;itmaynotbeappropriateforyourstudents.Imadeitbecausemyincomingclasswillconsistmostlyofstudentswhoareverybehindintheirmathskillsandareenrolledinlowerlevelmathclasses.Itmaynotbeappropriatetohigher-levelstudents.Ihavetriedtoputittogetherinawaytoexplainstatisticsintuitivelywithouttoomuchmath(thewayIwishsomeonehadtaughtmestatistics),sothatIcanlaythefoundationtobuilduptheirquantitativereasoningskill.IhavethemdosomepracticewithdatainExcel,sotheycangetcomfortable-dataisfromBradWilliamson'sArtificialselectionlab;youcanfinditundertheresourcesectionoftheAPTrainingModuleonquantitativemethods.AllIdidiscopyhisdataintoanewspreadsheet,justthedatanotthegraphs,Iwantthemtopracticemakinggraphs.IalsousetheHHMIClickandLearnResource:SamplingandNormaldistribution.Thisisonlymymodule1,inmodule2(stillintheworks)Iplanonintroducingstatisticalhypothesistesting.IplanonusingModule1inthebeginningoftheyear,Bbuttokeepgoingbacktoitandtoincorporateasmuchdataanalysisandstatisticsaspossible.Inadditiontodoinglabsandanalyzingtheirowndata,wewilluseotherscientists'datatopracticefromsourceslikeDataNuggets;HHMIDataPointsandsomescientificarticles.BelowyouwillfindsomelinksthatIthoughtareveryusefulaswellasreferencesIhaveusedtoputthisprimertogether.Iamnottryingtore-inventthewheel;greatteachersbeforemehavebuildit,justtoadjustittotheneedsofmystudents.Thiswouldhavenotseenthelightofdayifitwasn'tforBradWilliamsonandJenPfannerstill,PaulK.Strode,ValerieBolsterMay,PamClose,AnnBrokaw,RayanReardontonamejustafew.Feelfreetouseanyorallofit,butpleasegivecreditwherecreditisdue.DessyDimova,Ph.D.;BarnegatHighSchool,Barnegat,NJReferencesandhelpfullinks:Wonnacott,RJ,andTHWonnacott.IntroductoryStatistics.JohnWiley&Sons,Inc.,1969.Williamson,B,andJPfannerstill.“APBiologyQuantitativeMethods:AnIntroductiontoDiscriptiveStatistics.”APBiology-QuantitativeMethods-Final-v3,CollegeBoard,cb.collegeboard.org/ap-training-modules/ap-biology/quantitative-methods/story_html5.html.Cumming,Geoff.“ErrorBarsinExperimentalBiology.”TheJournalofCellBiology,vol.177,no.1,9Apr.2007,pp.7–11.JSTOR,www.jstor.org/stable/10.2307/30049848?ref=search-gateway:4ea19d3397e6165bff89292b29e5a97f.Forthofer,RonaldN.,andEunSulLee.IntroductiontoBiostatistics:aGuidetoDesign,AnalysisandDiscovery.AcademicPress,1995.Krzywinski,Martin,andNaomiAltman.“PointsofSignificance:ErrorBars.”NatureNews,NaturePublishingGroup,27Sept.2013,www.nature.com/nmeth/journal/v10/n10/full/nmeth.2659.html.Stawiery,CrystalJenkins.PropertiesofWaterLabwithStats.

2

Sabo,DavidW.“ProbabilityandStatisticsforBiologicalSciences.”TypesofData,commons.bcit.ca/math/faculty/david_sabo/apples/math2441/section2/typesofdata.htm.Helpfullinks:HHMIClickandLearnusedintheprimer,studenthandoutcanbedownloadeddirectlyfromHHMI:http://www.hhmi.org/biointeractive/sampling-and-normal-distributionTutorialsfromHHMIonusingspreadsheets:http://www.hhmi.org/biointeractive/spreadsheet-data-analysis-tutorials?fref=gcBradWilliamson-ArtificialSelectionanalysiswithExcel:https://www.youtube.com/watch?v=5ggSWuEzxeM&feature=youtu.befromChrisChou:https://www.ted.com/.../mona_chalabi_3_ways_to.../up-next...fromPamClose:https://www.nature.com/collections/qghhqmFromJenPfannerstill-excellentarticleonerrorbarsinBiology:http://jcb.rupress.org/content/177/1/7FromBradWilliamsonandLaTanyaSharpe:http://media.collegeboard.com/.../AP_Bio_Quantitative...FromJenPfannerstillandBradWilliamson:https://apcentral.collegeboard.org/.../professional...FromHHMI.ValerieBolsterMayandAnnBrokawfrequentlydotalksonthis:http://www.hhmi.org/bio.../teacher-guide-math-and-statisticsFromPaulKStrode:https://www.fairviewhs.org/.../ib-ap-biology-ii/folders/4411FromRayanReardon:https://jcibapbiology.wordpress.com/g_statistics-resources/3waystospotabadstatisticSometimesit'shardtoknowwhatstatisticsareworthyoftrust.Butweshouldn'tcountoutstatsaltogether...…TED.COM

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

1

StatisticsforAPBiology

Module1:IntroductiontoStatisticsIntroductiontoStatistics: Statisticalmethodsincludeproceduresforcollectingdata,presentingandsummarizingdata,anddrawinginferencesfromsampledata(wecaninferfromacollectedsamplesizeofdatawhatisoccurringinthegeneralpopulation).WatchMr.Anderson(until5:33only):http://www.bozemanscience.com/statistics-for-scienceSinceitisnotpracticaltocharacterizethewholepopulation,wesample.Theprimarypurposeofstatisticsistomakeaninferencetothewholepopulationfromasample.Beforewecandothatweneedtocharacterizethesampleusingafewdescriptivenumbers,theyarealsocalleddescriptivestatistics.WatchtherestofMr.Anderson'svideo.

TypesofDataThereisthedistinctionbetweenqualitativedataandquantitativedata:thetermqualitativecomesfromtheword"quality",indicatingaproperty,characteristic,featureorattribute.Qualitativedataisalwaysalistofwordsornamesofacharacteristic.Examplesofqualitativevariables(whichhavequalitative"values")aretheflavoroficecream,thecolorofaperson'seyesorhair,thespeciesofaselectedlifeform,thebrandofpotatochip,etc.Thetermquantitativecomesfromtheword"quantity",indicatingamount,measure,number,size,etc.Quantitativedataisalwaysalistofnumericalvalueswherethenumbersaremorethanjustnames,butactuallyrepresentmeasurednumericalvalues.ExamplesofquantitativevariablesthatmightbeconsideredinstudyingthepopulationofBHSstudentsaretheheightofastudent,theageofastudent,thenumberofapplesthestudentateinthepastweek.However,thestudentIDnumberisaqualitativevariableratherthanaquantitativevariable,sinceitisinsomewayequivalenttoanameforthatstudent.Sometimesnumericaldigitsareusedtorepresentqualitativevalues.Thus,theplayersonasportsteamoftenhavenumbersontheirshirts,butthesenumbersarequalitativelabels,notquantitativevalues.Similarly,statisticianssometimescodequalitativevalueswithnumericaldigits--forexample,lettingthenumericaldigits0and1standforthequalities"male"and"female",respectively.Qualitativedatacanbe"turned"intoquantitativedata.Forexamplewhenweuseascaleof1-5torepresenttherangeofresponsestoquestionsfrom"stronglydisagree"to"stronglyagree"onsurvey-typequestionnaires,onecouldregardtheresultasqualitative(ie.,oneofthelistof"stronglydisagree","disagree","noopinion","agree"or"stronglyagree")orasqualitative(thevalues1,2,3,4,and5measuringthedegreeofagreementwiththestatementgiven).

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

2

Quantitativedatawecollectcanbeeitherdiscreteorcontinuous.Hereisanexampleofdiscretedata-wetossadice50timesandrecordthenumberthatcomesup(from1to6),thiswillyieldastringof50numbers.Thefirststeptoorganizingthedataisafrequencytable.Wetallythenumberoftimeswegota1,thenumberoftimeswegota2,ansoon.Wecanthencalculatetherelativefrequency,forinstancewerolleda1atotalof9times,sothefrequencyis9/50=0.18.Thisinformationcanbegraphedandwillyieldthefrequencydistribution-thatishowfrequenteachnumberwasrecorded(rolled).TheX(numberondice)iscalledadiscretevariable,becauseithasfinitenumberofvalues(havinglimits,countable).Continuousexample:Supposeyoudrawasampleof200menfromacertainpopulationandrecordtheirheightininches.Theultimateaimwouldbetoinfertheaverageheightofmalesofthewholepopulation.Inthisexample,Xourvariableiscontinuous,anindividual'sheightmightbeanyvalue,suchas64.328inches.Itnolongermakessensetotalkaboutthefrequencyofthisspecificvalue,chancesarewe'llnotobserveagainsomeonewhoisexactly64.328inchestall.Insteadwegroupourdataintobins,orclasses,eachbin/classspansaspecificrangeofvalues(e.g.from58.5to60.5inches).Thuswewillgroupallmeasurementsthatfallbetween58.5and60.5inchesintooneclass/binandsoon.Wewillthentallythenumberofmeasurements(values)ineachbin/classandgetthefrequencyofheightswithinaclassthatincludesspecificrangeofvalues.Weplotthefrequencyofeachbininagraphcalledahistogram.Avisualillustrationoftheusefulnessofdoingthisisshownbelow. InthisexperimentAPBiostudentswereworkingwithFastPlantsandmeasuringthenumberofTrichomes,thehairsonthepetioleoffirsttrueleaf,of100plants.TheyputtheirdataintoanExcelspreadsheetandplottedit.Bellowyouwillseagraphrepresentingtheirdata.Agraphissupposedtohelpusvisualizeourdataanddrawconclusionsbasedonthedatapresented.Lookatthegraphbelow.Whatcanyousayaboutthenumberoftrichomesofthe100plants?Doyouseeanypatterns,trends,somethingthatjumpsoutatyou?

__________________________________________________________________________________________________________________________________________________________________________________________________Doyouthinkthisvisualrepresentationofthestudents'dataisuseful?____________________________________________________________________________________________

0

5

10

15

20

25

30

35

40

1 7 13

19

25

31

37

43

49

55

61

67

73

79

85

91

97

TrichomesperplantintheParentGen.

TrichomesperplantintheParentGen.

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

3

Whatifthestudentswantedtodeterminetheaveragenumberoftrichomesandhowmanyplantstherearethatare"abnormally"hairy?InthegraphbelowthestudentsconstructedaHistogram,byplacingtheirmeasurementsinseveralbins,eachbin-agroupofplantswithinarangeofnumberoftrichomes.

Draw2conclusionsfromtheHistogram,somethingthatwasn'tobviousinthefirstgraph.________________________________________________________________________________________________________________________________________________________________________________________________

________________________________________________________________________________________________Lookingatthehistogram,canyoucomeupwithonequestiontoaskregardingthepopulationofplants,somethingthatcanbeansweredwithexperimentation?___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________DiscriptiveStatistics-thetoolsweusetodescribe(charecterize)asampleAFrequencyTableandaFrequencyDistribution(thehistogram)arethefirststeptoorganizingquantitativedatasothatwecancharacterizeit,describeit.Youarealreadyfamiliarwiththediscriptivestatistics-mean,mode,median,rangeandevenstandarddeviation,butwewillintroducethemherealittledifferently.Instatisticsweusetwodescriptionsofthefrequencydistribution:1)theCentralpointfothedistributionand2)it'sspread1.Center,centralpoint-severaldifferentconceptsofthe"Center":Mode,MedianandMeanMean-average;Mode-themostfrequentvalue;Median-midlevalueofanorderedsetofvaluesLet'sexaminethefollowinghypotheicalfrequencydistribution.TheXvaluesrangefrom1to7.Theaverage/meanis4.

Youcanseethat4isinexactcenterofthehistogram.4alsohasthehighestbar,whichmeansitisthemostfrequentvalueormode.Anditsohappensthat4isalsointhemiddleoftheorderedsetofvalues(1;2;3;4;5;6;7),so4isthemedian.Inthishypotheticalhistogram,thecenteristhemean,modeandmedian.Soall3conceptsofcenteroverlappperfectly.Thishistogram/frequencydistributioniscalledthenormaldistribution,youmayhaveseenitasabell-shapedcurve(Youwouldgetabellshapedcurveifyoudrawaline,connectingthemiddleofthetops

051015202530

FirstGeneraSonTrichome,Sept.2013

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

4

ofeachbar).Anormaldistributionhasasinglepeakandissymmetrical.Notalldatashowsnormaldistribution,wewillnotconcernourselvestodaywithothertypesofdistributions,butitisimportnatforyoutounderstandthatfrequncydistributionsandtheshapeofhistogramsvary.Belowyouwillfindsomeexamples:

MostpeopleuseExceltoorganize,characterize,andvisualizedataaswellastodocalculations.DifferentversionsofExcel,especiallythosethatworkondifferentcomputersandOSs,areslightlydifferent.Thesedifferencesareenoughtofrustratetheuseriftheydon'tdothisfrequently.ThisexerciseisdesignedtoshowyouhowyoucanfigureoutonyourownhowtousefeaturesofExcelandhowtogethelpwithdifferentfunctions.Let'spracticemakingourownhistogramanddeterminingthemean,modeandmedianofasetofsampledata:1.OpentheexcelspreadsheetTrichomedata.xlx.ThespreadsheetcontainstherawdataoftheAPBiostudents'trichomeexperimentwetalkedaboutabove.2.LabelColumnsI,JandKwithMean,MedianandModerepectively;thencalculatethemean,medianandmode.Todothistypetheequalsignintheappropriatecell,thisbringsuptheformulabar;selecttheAVERAGE(formean),MedianorMode;thenselectallofthedata(A2:A101)andhitenter;thiswillreturntheresultofthecalculation3.CopyandpasteallthevaluesfromcolumnAintoColumnB(labelColumnB-ordereddata);selectallofthedatainColumnBandusetheSortfunction(Sort-Asceding-usecolumnheaders).Thiswillsortthevaluesandhelpyoudeterminethebinsyouaregoingtouseforthehistogram.Whatishesmallestvalue_________Whatisthelargestvalue_________WhatistheRangeofthedata________4.Decidehowmanybinsandwhattherangeofthebinswillbe.IncolumnE(labelasBins)putthehighnumberoftherangeofeachofyourbins,e.g.binwithrange6-10wouldread10.labelcolumnFasfrequencyornumber,thiswillbeyourtally,e.g.howmanyplantshadbetween6and10trichomes.Youcanpeakbackandseewhatthestudentsdidoryoucanuseyourownmethodfordecidingwhatbinstomake.5.UseExcelHelpandtypeinFrequency,thiswilltakeyoutotheinstructionshowtousethisformula(Iwillhelpyouaswell).Inthefirstcell,F2,typein=Frequency(A2:101;E2:E10).Thefirstsetinthe()tellsthefunctionwherethedataisthatneedstobesortedandcounted;thesecondsetin()tellsthefunctionwherethebinsareandwhatrangetheyhave.Intheaboveexampleyou'vemade9bins,eachisincellsE2-E10.Notetheformulamustbeenteredasanarrayformula.HowthisisdonedependsontheversionofExceland/orcomputerOS,instructionsareinthehelpsectionfortheFrequencyFunction.6.ColumnFshouldnowcontainthefrequenciesorourYvalueforthehistogram.OurXvalueswoudbetheactualbins,itisconvinienttolabelthewiththeactualrange,notjusttheactualhighnumber,e.g.labelis6-10NOTjust10.Copyandpasteinaseparatecolumntherangesyiuusedtomakethebins,inthecolumnnexttoitcopy->pastespecial(values)thenumbersfromcolumnF.selectthedatafrombothcolumnsandclickinsertchart,choosebar/columntypeofgraph.7.Youhavenowmadeahistogramthathopefullylookssimilartotheoneabove.Notewheronthehistogramthe3Centerpointsyoucalculateearlierfall-whereisthemean,modeandmedianonthehistogram?

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

5

_____________________________________________________________________________________________Dependingonhowyoumadeyourbinsyourhistogramsmaylookslightlydifferent,butonethingisclear,thehistogramdoesnotlookliketheexampleonethatwasperfectlysymetrical.Inrealexperimentationdatararelylooksthisperfect,especiallywhenitiscontinousdata(ifyoudon'trememberwhatthatmeans,scrolllup).TheMean,medianandmodedon'toverlapperfectly,butarecloseenoughifdatafollowsnormalldistribution.2.Spread-Deviationsormeasuresofthespread:Thespreadofadistributionreferstothevariabilityofthedata.Iftheobservationscoverawiderange,thespreadislarger.Iftheobservationsareclusteredaroundasinglevalue,thespreadissmaller.Considerthefiguresbelow.TheMean,modeandmedianvalueinbothcasesis5.Inthefigureontheleft,datavaluesrangefrom3to7;whereasinthefigureontheright,valuesrangefrom1to9.Thefigureontherightismorevariable,soithasthegreaterspread.

Ifwedetermineonlythemean(ormedianandmode)bothdatasetslookidentical,buttheyaren't,someasuresofthespreadareanotherveryimportantsetofdescriptivestatistics.Thesimplestoftheseyouarealreadyfamiliarwith-therange.Range=largest-

smallestvalue.Fortheexamplesabove,therangetellsusthatthefigureontherighthasagreaterspread.However,asalreadydiscussed,realfrequencydistributiondatararelylookssoperfectlysymmetrical.AmoreusefulmeasureofthespreadandtheonewidelyusedbyscientistsistheStandarddeviation.Intheexampleabove,ontheleftdatapointsareclosertothemeanthaninthefiguretotheright,sotheleftdatawillhaveasmallerstandarddeviation.Hereistheformula(noneedtomemorize):Let'sexaminewhatitmeans.Xiisanysinglemeasurmentyouhavecollected,

X isthemean.Thediviationofeachobservedvaluefromthemeanis

Xi − X .Thismeasureshowdifferenteachobservedvalueisfromthemean.Theaveragedeviationthenwouldbetosumalltheindividualdeviationsandthendividebyn,wherenisthenumberofindependentmeasurements.Takeamomentandcalculatethemeandiviationofthedatadisplayedonthehistogramontheleftintheabovefigure.Rememberthevaluesrangefrom3to7andthemeanis5.Whatnumberdidyouget?Isthismeasureusefulindetermininghowvariedthedatais?Explain.________________________________________________________________________________________________Ifwetakethesquaredvalueofeachdiviation,wewouldalwaysgetapostitivenumber,wecanthendeterminethe

averagesquareddeviation.

Xi − X( )∑n

2

.The

∑ signsimplymeanssum,weaddupallthedeviations.

Thismeasureisusefulindescribingthevariationofoursample.Butwhydowecareaboutthespread,thevariationinoursamples?Considertheexamplewediscussedearlierinwhichwemeasuretheheightof200men.Knowingwhatthespreadisisgoingtotelluswhetherthereareanyextremelyshortmenand/orwhethertheyrearesomeextremelytallmeninthispopultation.Whatifwemeasuretheheightof200menfromBarnegat(population20,936)andthen

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

6

x

measuretheheightof200meninFalkenberg(population20,035),atowninSweden.WewouldliketomaketheclaimthatmenfromFalkenberg,SwedenaretallerthanmenfromtheBarnegat.Ifwesimplyusetheaveragesquareddeviationwewoulddescribebothsamples.However,wewouldliketoinferfromthesetwosamplestheaverageheightofthewholepopulation,wewouldwanttomakeastatisticalinferenceaboutthetwopopulationsandiftheytrulydiffer.Tobeabletodothisweneedtodividethesumofalldeviationsbyn-1,notn.n-1arethedegreesoffreedom,

rememberMr.Anderson'sexplanationinthevideoyouwatched.Sonowwehave

Xi − X( )∑n −1

2

.Becausethe

deviationsaresquared,itwouldbeniceto'un-square'them,otherwiseknownassquareroot,thustheformulaforStandarddeviationbecomes

Let'sexplorewhattheStandarddeviationmeans.Typicaldatawillshowanormaldistribution(seefigure).Empiricalrule:Innormaldistribution,about68%ofvaluesarewithinonestandarddeviationofthemean,95%ofvaluesarewithintwostandarddeviationsofthemean,and99%ofthevaluesarewithinthreestandarddeviationsofthemean.StadarddeviationisnotedaseithersorSD.OftentheSDisplottedinbargraphsasanerrorbar,wewilltalkabouterrorbarsinabit.YouwillnotbeaskedtocalculateStadarddeviationontheAPexam,norshouldyouexpecttocalculateitbyhand(withacalculator)inthisclass.However,youwillbeaskedtomanipulatedatausingExcel.Solet'spractice.Gobacktothespreadsheetwiththetrichomedata.Youhavealreadycalculatedthemean(average)numberoftrichomes.1.Calculatethestadarddeviationforthisdataset.(=STDEV(A2:A101))andwriteitdown_______Betweenwhichtwovaluesare68%ofthedatalocated?__________and_____________Betweenwhichtwovaluesare95%ofthedatalocated?__________and_____________2.Atthebottomofthespreadsheetaretabs.Clickonthesecondtab"SecondGen".Thisisasecondsetofdataontrichomenumbers.Thestudentshaveobservedandcountedthetrichomesonthesecondgenerationofplants.Wewillnotdoanyhistogramsforthissecondsetofdata.3.CalculatetheMeanandthestandarddeviationforthe2ndGenofplants.Betweenwhichtwovaluesare68%ofthedatalocated?__________and_____________Betweenwhichtwovaluesare95%ofthedatalocated?__________and_____________6.Openthethirdtab"Analysis",LabelcolumnA1stGen,columnB2ndGen.Place(copyfromothertabsandpastespecial-values)intheappropriatecolumnstheMeanandSDforbothgenerationofplants.MarkinColumnCwhichrowiswhich.

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

7

7.Thestudentsdidsomethingtothesecondgenerationofplants(nevermindwhat),andwewanttofindoutwhetherthereisadifferenceintrichomenumberbetweenthetwogenerations.MakeaBargraphplottingtheaveragetrichomenumberforeachgenerationofplants.8.Usingboththebargraph,whichcomparesthemeansofthetwogenerationsofplantsandthespreadofthevaluesyoucalculatedabove(68%ofdatalocatedbetween...)canyoudrawaconclusionwhetherthereisachangeintrichomenumbersinthe2ndgeneration?Justifyyouranswer.____________________________________________________________________________________________________________________________________________________________________________________________________HowcanwebecertainthatwhateverthestudentsdidinthatPlantexperimentresultedinhighernumbersoftrichomes?Orinthepreviousexamplewhenwemeasuredtheheightof200menformBarnegatand200menfromFalkenberg,Sweden,whatisourconfidencethatthe200menwehavesampledtrulyrepresenttheheightoftherespectivepopulationsofBarnegatandFalkenberg?Tohelpyoubetterunderstandthedifferencebetweensamplemeanandthetruemean(theactualmeanofthewholepopulation)andthesignificanceofsamplingandsamplingsize,wewilldoanactivitywhichwillinvolveseveralcomputersimulations,somecalculationsandsomegraphing.Directions:1.Gotohttp://www.hhmi.org/biointeractive/sampling-and-normal-distribution;clickontheintroductiontabandread2.Completeitems1-10onyourown.Shareyourindividualresults:specificallytheRangefromyourtableandwhatthedistributionslookedlike.3.Pairupandspendsometimediscussingandultimatelydecidingwhatanappropriatesamplesizewouldbe.Thenwriteitdown.Thesimulationcanrunwiththefollowingsamplesizes:491625100400and1000.Sincewehavealreadydonesamplesize4,wewillnowdotherestinitem11oftheWorksheet.YoucanfindthesamplesizethatisclosesttoyouridealsamplesizeorIcanassignyouasamplesize.Wewilldo#11and#12inpairs.Iwillprovideyouwithgridstoplotthesamplemeans.4.Compareclassdataanddiscussquestion13.For a population of infinite size, what sample size would be small enough to be feasible to collect data from while still giving you a good representation of the mean for the population? Explain why you selected this sample size. Recordyouranswerhere:______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________Upuntilnowwehavetalkedaboutdescriptivestatistics:mean,range,standarddeviation,etc.Butinsciencewewouldliketobeabletodrawaninference,todrawaconclusionifyouwillaboutthewholepopulationbasedonoursamplemeasurements.Thesamplemeanisnotnecessarilyidenticaltothemeanoftheentirepopulation(thetruemean).

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

8

Youalreadydiscoveredthattheappearanceofahistogramdependsonthesamplesize,andthatwithsmallsamplesizesthedistributionmaynotbenormal,eventhoughwearesamplinganormallydistributedpopulation.Additionally,wesawthatthelargerthesamplesize,thebetterrepresentationwegetofthemeanofthepopulation.Unfortunatelyitisnotpracticaltohaveverylargesamplesizes,andoftensamplesizesarequitesmall.Soweeneedameasureofourconfidencethatthesamplemeanisagoodrepresentationofthetruemean(thepopulationmean).Everytimeyoutakeasampleandcalculateasamplemean,youwouldexpectaslightlydifferentvalue(yousawthisinthesimulation).Inotherwords,thesamplemeansthemselveshavevariability.Thisvariabilitycanbeexpressedbycalculatingthestandarderrorofthemean(abbreviatedasSEorSEM).Standarderrorisaninferentialstatistic,thismeanswecanuseittodrawinference,toinfer.Inferentialstatisticsgiveustheconfidence(theoppositebeinguncertainty)thatasamplemeanisarepresentationofthetruemean;andtheycanalsousedtoinferdifferencesbetweenmeansoftwodifferentpopulations(remembertheplantsandthemen).Confidence(Uncertainty)isusuallyrepresentedaserrorbarsongraphs,forinstanceinthegraphyouconstructedontrichomedensity,eachbarshouldhaveanerrorbar(+/-;alineaboveandalinebelowthemean)indicatingtherangeinwhichwecansaywith95%confidencethatthetrue(population)meanlies.Butmoreonerrorbarslater.RightnowwewillexploretheinferentialstatisticscalledStandarderrorofthemean.

GotoPartIIofyourBiointeractiveWS.Onthesimulationclickthearrowtomovetothenextmodule-StandardErroroftheMean.Completeitems1-11,DONOTfilloutthelasttwocolumnsofTable3.

AnotherinferentialstatisticsistheConfidenceinterval,orCI,typicallywetalkabout95%CI.95%CIisarangecenteredaroundthesamplemean.Thisisarangeofvaluesyoucanbe95%confidentcontainsthetruemean.Whenscientistswanttocomparethemeansoftwodifferentpopulationstheywillplotthemeanofeachpopulationwitherrorbars.Theerrorbarsgivearange,thecanbe+/-SEor-/+2xSEor95%CI.Let'sbrieflyexaminethem.SEbars-SE(SEM)isanestimationofthestandarddeviationofthemean(thetruemean).InthesimulationyousawthatwhensamplesizeislargetheSEascalculatedbytheformulaaboveisveryclosetotheStandarddeviationofthemean.Wecan'tdoanexperiment100or400timeslikeinthesimulation,butwecanestimatetheStandarddeviationofthemean(thetruemean).ForlargesamplesizesthecalculatedSEisagoodapproximationoftheSDoftheMeanofthepopulationandassuchwecanapplytheempiricalrule(seep.6)andsaythat95%ofthevaluesarewithintwostandarddeviationsofthemean,orinthiscase2SEM,soforlargesamplesizes95%CI=Mean+/-2xSEM.Oftenyouwillsee2xSEMbarsand95%CIbarsusedinterchangibly.IfyouencounterMean+/-SEM(singleSEMbar),allyouneedtodoisvisuallydoubletheerrorbartogetthe95%CI.CI(confidenceintervals)bars-arethebest,butunfortunatelyrarelyusedbybiologists.Confidenceintervalsarecalculated,thereisastatisticalformula.TheyaredependentontheSEM,thesamplesizeandsomethingelse.Wewill

notdiscusshowCIiscalculated,butitisimportanttorealizethatdoublingSEMisonlyanapproximationofthetruevalueofCI.Ifsamplesizeis>or=10,then2xSEM-95%CI.However,forsmallersamplesizesthisisnotthecase.Ifn=3,weneedtomultiplySEMby4toget95%CI;forn=5wemultiplyby2.5,etc.Lookatthepictureontheright.Thisissimilartowhatyousawinthesimulation,butlet'simaginea"reallife"situation.Imaginethatthegovernmenthasmanagedtomeasuretheheightofevery4yearoldboywhocurrentlylivesintheUS(thisisnotpractical,butitisnotimpossible).Theaverageheightfor4yearoldboysintheUSis40inches,sotheGovernmentknowsthetruemeanofthepopulationbecausethataveragewasderivedfrommeasuringEVERYboy.

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

9

However,thegovernmentisnotsharingthedata,insteadPediatriciansrelayontheirownmeasurements.Sincenoonepediatriciancanmeasureallboys,theyrelayonsamplingthepopulationtodeterminewhatisanaverageheightfor4yearolds.20Largestudiesweredonealloverthecountrymeasuringtheheightof400boyseachanddeterminingmeans,SD,SEandCI.The20differentmeansandthe95%CIarerepresentedinthefigure.Thedotsarethemeans,andthelines(errorbars)arethe95%CI,therangeinwhichweexpecttofindthetruemean19/20times.Inthefigureyouwillseethatinthe18outofthe20experiments,the95%CIcapturedthetruemean,2didn't(opencircles).InotherwordsthemeanofthedatawithSEorCIbarsgivesanindicationoftheregionwhereyoucanexpectthemeanofthewholepopulation,butdespitethe95%confidenceyoumaystillendupnotcapturingthetruemeans.ErrorBars:ThetablebelowsummarizescommonerrorbarsusedinBiology.

Note:YouareNOTexpectedtoknowtheseformulas;theyaregiventoyousoyoucanunderstandwhattheerrorbarsmean.Mostproblemsyouwillencounterwillbeusing2xSEM,justbeawarethatifthesamplesizeissmall,youwillneedtovisuallyextendtheerrorbars(~3-4xSEM).Youmayseegraphswith+/-SD,wecan'tusethemtoinferaboutthepopulationmeanortocomparetwodifferentpopulationmeans,SDisadescriptivestatisticsandsoareSDerrorbars,theysimplyshowthespread,thevariationofthedatacollected.However,ifthedescriptionoftheexperimentincludesasamplesize(n),youcancalculatetheSEfromthegivenSD.Let'sreturntooursampletrichomedata.You'vecalculatedthemeanandtheSD.1)CalculatetheSEMforbothgenerationsusingtheformulaabove,andplacecalculationsintherowunderyourSD.2)Calculatethe2xSEMvalueandplaceunderyourSEMcalculations.3)Add=/-2xSEMerrorbarstoyourgraph.(Iwillnowshowyouhowtoadderrorbarstoyourgraph,stronglyencourageyoutotakenotes)Sinceoursamplesizeis100,wecanbesurethat2xSEM=95%CI.Now,thatwehaveaddeda95%Confidenceinterval,doyourconclusionsaboutthedatachange?Aremoreorlessconfidentinyouranalysis?

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

10

_______________________________________________________________________________________________So,howdoweinterpreterrorbars?Whenwecomparetwodifferentpopulationsandwanttomakeastatementthattheydifferordonotdifferintheirmeans,errorbarscomeinhandy.Payattentionasyouwillbeexpectedtointerpretexperimentalresultsbasedonerrorbars.1.Themeansofthetwopopulationsaredifferentandtheirerrorbars(2xSEMor95%CI)donotoverlap,nordotheyoverlapthemean,thisisprettystrongevidencethatthereisadifferencebetweenthepopulations.2.Iftheerrorbarsoverlapjustalittlebit,butdonotoverlapthemeans,youdonothavestrongevidenceeitherway.Ifonlythistypeofdataispresented,youcansaythatthedataisinconclusive.3.Whentheerrorbarsoverlapbotheachotherandthemeans,thenmostlikethereisnodifferencebetweenthepopulations.Takehomelesson#1:Statisticsisaguide.Itdoesnotproveordisproveanything.Itguidesusintheinterpretationofdata.Tobe'certain'scientistswillcollectmultiplelinesofevidence(performmultipledifferentexperiments,testtheirpredictionsinmultipledifferentways,independentscientistwillcheckeachother,etc.)andalsotrytomakebiologicalsenseoftheirdata(linktowhatisknown).Wewilltalkmoreaboutstatisticallysignificantdifferencesandhypothesistestinglateron.InordertopracticeusingExcelspreadsheettoanalyzedata,completethetutorialslistedbelow.http://www.hhmi.org/biointeractive/spreadsheet-data-analysis-tutorials?fref=gc

top related