a note to teachers - life sciences rlhs · williamson and jen pfannerstill, paul k. strode, valerie...
TRANSCRIPT
1
ANotetoTeachersThisstatisticsprimerwasdesignedwithmystudentsinmind;itmaynotbeappropriateforyourstudents.Imadeitbecausemyincomingclasswillconsistmostlyofstudentswhoareverybehindintheirmathskillsandareenrolledinlowerlevelmathclasses.Itmaynotbeappropriatetohigher-levelstudents.Ihavetriedtoputittogetherinawaytoexplainstatisticsintuitivelywithouttoomuchmath(thewayIwishsomeonehadtaughtmestatistics),sothatIcanlaythefoundationtobuilduptheirquantitativereasoningskill.IhavethemdosomepracticewithdatainExcel,sotheycangetcomfortable-dataisfromBradWilliamson'sArtificialselectionlab;youcanfinditundertheresourcesectionoftheAPTrainingModuleonquantitativemethods.AllIdidiscopyhisdataintoanewspreadsheet,justthedatanotthegraphs,Iwantthemtopracticemakinggraphs.IalsousetheHHMIClickandLearnResource:SamplingandNormaldistribution.Thisisonlymymodule1,inmodule2(stillintheworks)Iplanonintroducingstatisticalhypothesistesting.IplanonusingModule1inthebeginningoftheyear,Bbuttokeepgoingbacktoitandtoincorporateasmuchdataanalysisandstatisticsaspossible.Inadditiontodoinglabsandanalyzingtheirowndata,wewilluseotherscientists'datatopracticefromsourceslikeDataNuggets;HHMIDataPointsandsomescientificarticles.BelowyouwillfindsomelinksthatIthoughtareveryusefulaswellasreferencesIhaveusedtoputthisprimertogether.Iamnottryingtore-inventthewheel;greatteachersbeforemehavebuildit,justtoadjustittotheneedsofmystudents.Thiswouldhavenotseenthelightofdayifitwasn'tforBradWilliamsonandJenPfannerstill,PaulK.Strode,ValerieBolsterMay,PamClose,AnnBrokaw,RayanReardontonamejustafew.Feelfreetouseanyorallofit,butpleasegivecreditwherecreditisdue.DessyDimova,Ph.D.;BarnegatHighSchool,Barnegat,NJReferencesandhelpfullinks:Wonnacott,RJ,andTHWonnacott.IntroductoryStatistics.JohnWiley&Sons,Inc.,1969.Williamson,B,andJPfannerstill.“APBiologyQuantitativeMethods:AnIntroductiontoDiscriptiveStatistics.”APBiology-QuantitativeMethods-Final-v3,CollegeBoard,cb.collegeboard.org/ap-training-modules/ap-biology/quantitative-methods/story_html5.html.Cumming,Geoff.“ErrorBarsinExperimentalBiology.”TheJournalofCellBiology,vol.177,no.1,9Apr.2007,pp.7–11.JSTOR,www.jstor.org/stable/10.2307/30049848?ref=search-gateway:4ea19d3397e6165bff89292b29e5a97f.Forthofer,RonaldN.,andEunSulLee.IntroductiontoBiostatistics:aGuidetoDesign,AnalysisandDiscovery.AcademicPress,1995.Krzywinski,Martin,andNaomiAltman.“PointsofSignificance:ErrorBars.”NatureNews,NaturePublishingGroup,27Sept.2013,www.nature.com/nmeth/journal/v10/n10/full/nmeth.2659.html.Stawiery,CrystalJenkins.PropertiesofWaterLabwithStats.
2
Sabo,DavidW.“ProbabilityandStatisticsforBiologicalSciences.”TypesofData,commons.bcit.ca/math/faculty/david_sabo/apples/math2441/section2/typesofdata.htm.Helpfullinks:HHMIClickandLearnusedintheprimer,studenthandoutcanbedownloadeddirectlyfromHHMI:http://www.hhmi.org/biointeractive/sampling-and-normal-distributionTutorialsfromHHMIonusingspreadsheets:http://www.hhmi.org/biointeractive/spreadsheet-data-analysis-tutorials?fref=gcBradWilliamson-ArtificialSelectionanalysiswithExcel:https://www.youtube.com/watch?v=5ggSWuEzxeM&feature=youtu.befromChrisChou:https://www.ted.com/.../mona_chalabi_3_ways_to.../up-next...fromPamClose:https://www.nature.com/collections/qghhqmFromJenPfannerstill-excellentarticleonerrorbarsinBiology:http://jcb.rupress.org/content/177/1/7FromBradWilliamsonandLaTanyaSharpe:http://media.collegeboard.com/.../AP_Bio_Quantitative...FromJenPfannerstillandBradWilliamson:https://apcentral.collegeboard.org/.../professional...FromHHMI.ValerieBolsterMayandAnnBrokawfrequentlydotalksonthis:http://www.hhmi.org/bio.../teacher-guide-math-and-statisticsFromPaulKStrode:https://www.fairviewhs.org/.../ib-ap-biology-ii/folders/4411FromRayanReardon:https://jcibapbiology.wordpress.com/g_statistics-resources/3waystospotabadstatisticSometimesit'shardtoknowwhatstatisticsareworthyoftrust.Butweshouldn'tcountoutstatsaltogether...…TED.COM
Name__________________________
DessyDimova,Ph.D.BarnegatHighSchool
1
StatisticsforAPBiology
Module1:IntroductiontoStatisticsIntroductiontoStatistics: Statisticalmethodsincludeproceduresforcollectingdata,presentingandsummarizingdata,anddrawinginferencesfromsampledata(wecaninferfromacollectedsamplesizeofdatawhatisoccurringinthegeneralpopulation).WatchMr.Anderson(until5:33only):http://www.bozemanscience.com/statistics-for-scienceSinceitisnotpracticaltocharacterizethewholepopulation,wesample.Theprimarypurposeofstatisticsistomakeaninferencetothewholepopulationfromasample.Beforewecandothatweneedtocharacterizethesampleusingafewdescriptivenumbers,theyarealsocalleddescriptivestatistics.WatchtherestofMr.Anderson'svideo.
TypesofDataThereisthedistinctionbetweenqualitativedataandquantitativedata:thetermqualitativecomesfromtheword"quality",indicatingaproperty,characteristic,featureorattribute.Qualitativedataisalwaysalistofwordsornamesofacharacteristic.Examplesofqualitativevariables(whichhavequalitative"values")aretheflavoroficecream,thecolorofaperson'seyesorhair,thespeciesofaselectedlifeform,thebrandofpotatochip,etc.Thetermquantitativecomesfromtheword"quantity",indicatingamount,measure,number,size,etc.Quantitativedataisalwaysalistofnumericalvalueswherethenumbersaremorethanjustnames,butactuallyrepresentmeasurednumericalvalues.ExamplesofquantitativevariablesthatmightbeconsideredinstudyingthepopulationofBHSstudentsaretheheightofastudent,theageofastudent,thenumberofapplesthestudentateinthepastweek.However,thestudentIDnumberisaqualitativevariableratherthanaquantitativevariable,sinceitisinsomewayequivalenttoanameforthatstudent.Sometimesnumericaldigitsareusedtorepresentqualitativevalues.Thus,theplayersonasportsteamoftenhavenumbersontheirshirts,butthesenumbersarequalitativelabels,notquantitativevalues.Similarly,statisticianssometimescodequalitativevalueswithnumericaldigits--forexample,lettingthenumericaldigits0and1standforthequalities"male"and"female",respectively.Qualitativedatacanbe"turned"intoquantitativedata.Forexamplewhenweuseascaleof1-5torepresenttherangeofresponsestoquestionsfrom"stronglydisagree"to"stronglyagree"onsurvey-typequestionnaires,onecouldregardtheresultasqualitative(ie.,oneofthelistof"stronglydisagree","disagree","noopinion","agree"or"stronglyagree")orasqualitative(thevalues1,2,3,4,and5measuringthedegreeofagreementwiththestatementgiven).
Name__________________________
DessyDimova,Ph.D.BarnegatHighSchool
2
Quantitativedatawecollectcanbeeitherdiscreteorcontinuous.Hereisanexampleofdiscretedata-wetossadice50timesandrecordthenumberthatcomesup(from1to6),thiswillyieldastringof50numbers.Thefirststeptoorganizingthedataisafrequencytable.Wetallythenumberoftimeswegota1,thenumberoftimeswegota2,ansoon.Wecanthencalculatetherelativefrequency,forinstancewerolleda1atotalof9times,sothefrequencyis9/50=0.18.Thisinformationcanbegraphedandwillyieldthefrequencydistribution-thatishowfrequenteachnumberwasrecorded(rolled).TheX(numberondice)iscalledadiscretevariable,becauseithasfinitenumberofvalues(havinglimits,countable).Continuousexample:Supposeyoudrawasampleof200menfromacertainpopulationandrecordtheirheightininches.Theultimateaimwouldbetoinfertheaverageheightofmalesofthewholepopulation.Inthisexample,Xourvariableiscontinuous,anindividual'sheightmightbeanyvalue,suchas64.328inches.Itnolongermakessensetotalkaboutthefrequencyofthisspecificvalue,chancesarewe'llnotobserveagainsomeonewhoisexactly64.328inchestall.Insteadwegroupourdataintobins,orclasses,eachbin/classspansaspecificrangeofvalues(e.g.from58.5to60.5inches).Thuswewillgroupallmeasurementsthatfallbetween58.5and60.5inchesintooneclass/binandsoon.Wewillthentallythenumberofmeasurements(values)ineachbin/classandgetthefrequencyofheightswithinaclassthatincludesspecificrangeofvalues.Weplotthefrequencyofeachbininagraphcalledahistogram.Avisualillustrationoftheusefulnessofdoingthisisshownbelow. InthisexperimentAPBiostudentswereworkingwithFastPlantsandmeasuringthenumberofTrichomes,thehairsonthepetioleoffirsttrueleaf,of100plants.TheyputtheirdataintoanExcelspreadsheetandplottedit.Bellowyouwillseagraphrepresentingtheirdata.Agraphissupposedtohelpusvisualizeourdataanddrawconclusionsbasedonthedatapresented.Lookatthegraphbelow.Whatcanyousayaboutthenumberoftrichomesofthe100plants?Doyouseeanypatterns,trends,somethingthatjumpsoutatyou?
__________________________________________________________________________________________________________________________________________________________________________________________________Doyouthinkthisvisualrepresentationofthestudents'dataisuseful?____________________________________________________________________________________________
0
5
10
15
20
25
30
35
40
1 7 13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
TrichomesperplantintheParentGen.
TrichomesperplantintheParentGen.
Name__________________________
DessyDimova,Ph.D.BarnegatHighSchool
3
Whatifthestudentswantedtodeterminetheaveragenumberoftrichomesandhowmanyplantstherearethatare"abnormally"hairy?InthegraphbelowthestudentsconstructedaHistogram,byplacingtheirmeasurementsinseveralbins,eachbin-agroupofplantswithinarangeofnumberoftrichomes.
Draw2conclusionsfromtheHistogram,somethingthatwasn'tobviousinthefirstgraph.________________________________________________________________________________________________________________________________________________________________________________________________
________________________________________________________________________________________________Lookingatthehistogram,canyoucomeupwithonequestiontoaskregardingthepopulationofplants,somethingthatcanbeansweredwithexperimentation?___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________DiscriptiveStatistics-thetoolsweusetodescribe(charecterize)asampleAFrequencyTableandaFrequencyDistribution(thehistogram)arethefirststeptoorganizingquantitativedatasothatwecancharacterizeit,describeit.Youarealreadyfamiliarwiththediscriptivestatistics-mean,mode,median,rangeandevenstandarddeviation,butwewillintroducethemherealittledifferently.Instatisticsweusetwodescriptionsofthefrequencydistribution:1)theCentralpointfothedistributionand2)it'sspread1.Center,centralpoint-severaldifferentconceptsofthe"Center":Mode,MedianandMeanMean-average;Mode-themostfrequentvalue;Median-midlevalueofanorderedsetofvaluesLet'sexaminethefollowinghypotheicalfrequencydistribution.TheXvaluesrangefrom1to7.Theaverage/meanis4.
Youcanseethat4isinexactcenterofthehistogram.4alsohasthehighestbar,whichmeansitisthemostfrequentvalueormode.Anditsohappensthat4isalsointhemiddleoftheorderedsetofvalues(1;2;3;4;5;6;7),so4isthemedian.Inthishypotheticalhistogram,thecenteristhemean,modeandmedian.Soall3conceptsofcenteroverlappperfectly.Thishistogram/frequencydistributioniscalledthenormaldistribution,youmayhaveseenitasabell-shapedcurve(Youwouldgetabellshapedcurveifyoudrawaline,connectingthemiddleofthetops
051015202530
FirstGeneraSonTrichome,Sept.2013
Name__________________________
DessyDimova,Ph.D.BarnegatHighSchool
4
ofeachbar).Anormaldistributionhasasinglepeakandissymmetrical.Notalldatashowsnormaldistribution,wewillnotconcernourselvestodaywithothertypesofdistributions,butitisimportnatforyoutounderstandthatfrequncydistributionsandtheshapeofhistogramsvary.Belowyouwillfindsomeexamples:
MostpeopleuseExceltoorganize,characterize,andvisualizedataaswellastodocalculations.DifferentversionsofExcel,especiallythosethatworkondifferentcomputersandOSs,areslightlydifferent.Thesedifferencesareenoughtofrustratetheuseriftheydon'tdothisfrequently.ThisexerciseisdesignedtoshowyouhowyoucanfigureoutonyourownhowtousefeaturesofExcelandhowtogethelpwithdifferentfunctions.Let'spracticemakingourownhistogramanddeterminingthemean,modeandmedianofasetofsampledata:1.OpentheexcelspreadsheetTrichomedata.xlx.ThespreadsheetcontainstherawdataoftheAPBiostudents'trichomeexperimentwetalkedaboutabove.2.LabelColumnsI,JandKwithMean,MedianandModerepectively;thencalculatethemean,medianandmode.Todothistypetheequalsignintheappropriatecell,thisbringsuptheformulabar;selecttheAVERAGE(formean),MedianorMode;thenselectallofthedata(A2:A101)andhitenter;thiswillreturntheresultofthecalculation3.CopyandpasteallthevaluesfromcolumnAintoColumnB(labelColumnB-ordereddata);selectallofthedatainColumnBandusetheSortfunction(Sort-Asceding-usecolumnheaders).Thiswillsortthevaluesandhelpyoudeterminethebinsyouaregoingtouseforthehistogram.Whatishesmallestvalue_________Whatisthelargestvalue_________WhatistheRangeofthedata________4.Decidehowmanybinsandwhattherangeofthebinswillbe.IncolumnE(labelasBins)putthehighnumberoftherangeofeachofyourbins,e.g.binwithrange6-10wouldread10.labelcolumnFasfrequencyornumber,thiswillbeyourtally,e.g.howmanyplantshadbetween6and10trichomes.Youcanpeakbackandseewhatthestudentsdidoryoucanuseyourownmethodfordecidingwhatbinstomake.5.UseExcelHelpandtypeinFrequency,thiswilltakeyoutotheinstructionshowtousethisformula(Iwillhelpyouaswell).Inthefirstcell,F2,typein=Frequency(A2:101;E2:E10).Thefirstsetinthe()tellsthefunctionwherethedataisthatneedstobesortedandcounted;thesecondsetin()tellsthefunctionwherethebinsareandwhatrangetheyhave.Intheaboveexampleyou'vemade9bins,eachisincellsE2-E10.Notetheformulamustbeenteredasanarrayformula.HowthisisdonedependsontheversionofExceland/orcomputerOS,instructionsareinthehelpsectionfortheFrequencyFunction.6.ColumnFshouldnowcontainthefrequenciesorourYvalueforthehistogram.OurXvalueswoudbetheactualbins,itisconvinienttolabelthewiththeactualrange,notjusttheactualhighnumber,e.g.labelis6-10NOTjust10.Copyandpasteinaseparatecolumntherangesyiuusedtomakethebins,inthecolumnnexttoitcopy->pastespecial(values)thenumbersfromcolumnF.selectthedatafrombothcolumnsandclickinsertchart,choosebar/columntypeofgraph.7.Youhavenowmadeahistogramthathopefullylookssimilartotheoneabove.Notewheronthehistogramthe3Centerpointsyoucalculateearlierfall-whereisthemean,modeandmedianonthehistogram?
Name__________________________
DessyDimova,Ph.D.BarnegatHighSchool
5
_____________________________________________________________________________________________Dependingonhowyoumadeyourbinsyourhistogramsmaylookslightlydifferent,butonethingisclear,thehistogramdoesnotlookliketheexampleonethatwasperfectlysymetrical.Inrealexperimentationdatararelylooksthisperfect,especiallywhenitiscontinousdata(ifyoudon'trememberwhatthatmeans,scrolllup).TheMean,medianandmodedon'toverlapperfectly,butarecloseenoughifdatafollowsnormalldistribution.2.Spread-Deviationsormeasuresofthespread:Thespreadofadistributionreferstothevariabilityofthedata.Iftheobservationscoverawiderange,thespreadislarger.Iftheobservationsareclusteredaroundasinglevalue,thespreadissmaller.Considerthefiguresbelow.TheMean,modeandmedianvalueinbothcasesis5.Inthefigureontheleft,datavaluesrangefrom3to7;whereasinthefigureontheright,valuesrangefrom1to9.Thefigureontherightismorevariable,soithasthegreaterspread.
Ifwedetermineonlythemean(ormedianandmode)bothdatasetslookidentical,buttheyaren't,someasuresofthespreadareanotherveryimportantsetofdescriptivestatistics.Thesimplestoftheseyouarealreadyfamiliarwith-therange.Range=largest-
smallestvalue.Fortheexamplesabove,therangetellsusthatthefigureontherighthasagreaterspread.However,asalreadydiscussed,realfrequencydistributiondatararelylookssoperfectlysymmetrical.AmoreusefulmeasureofthespreadandtheonewidelyusedbyscientistsistheStandarddeviation.Intheexampleabove,ontheleftdatapointsareclosertothemeanthaninthefiguretotheright,sotheleftdatawillhaveasmallerstandarddeviation.Hereistheformula(noneedtomemorize):Let'sexaminewhatitmeans.Xiisanysinglemeasurmentyouhavecollected,
€
X isthemean.Thediviationofeachobservedvaluefromthemeanis
€
Xi − X .Thismeasureshowdifferenteachobservedvalueisfromthemean.Theaveragedeviationthenwouldbetosumalltheindividualdeviationsandthendividebyn,wherenisthenumberofindependentmeasurements.Takeamomentandcalculatethemeandiviationofthedatadisplayedonthehistogramontheleftintheabovefigure.Rememberthevaluesrangefrom3to7andthemeanis5.Whatnumberdidyouget?Isthismeasureusefulindetermininghowvariedthedatais?Explain.________________________________________________________________________________________________Ifwetakethesquaredvalueofeachdiviation,wewouldalwaysgetapostitivenumber,wecanthendeterminethe
averagesquareddeviation.
€
Xi − X( )∑n
2
.The
€
∑ signsimplymeanssum,weaddupallthedeviations.
Thismeasureisusefulindescribingthevariationofoursample.Butwhydowecareaboutthespread,thevariationinoursamples?Considertheexamplewediscussedearlierinwhichwemeasuretheheightof200men.Knowingwhatthespreadisisgoingtotelluswhetherthereareanyextremelyshortmenand/orwhethertheyrearesomeextremelytallmeninthispopultation.Whatifwemeasuretheheightof200menfromBarnegat(population20,936)andthen
Name__________________________
DessyDimova,Ph.D.BarnegatHighSchool
6
x
measuretheheightof200meninFalkenberg(population20,035),atowninSweden.WewouldliketomaketheclaimthatmenfromFalkenberg,SwedenaretallerthanmenfromtheBarnegat.Ifwesimplyusetheaveragesquareddeviationwewoulddescribebothsamples.However,wewouldliketoinferfromthesetwosamplestheaverageheightofthewholepopulation,wewouldwanttomakeastatisticalinferenceaboutthetwopopulationsandiftheytrulydiffer.Tobeabletodothisweneedtodividethesumofalldeviationsbyn-1,notn.n-1arethedegreesoffreedom,
rememberMr.Anderson'sexplanationinthevideoyouwatched.Sonowwehave
€
Xi − X( )∑n −1
2
.Becausethe
deviationsaresquared,itwouldbeniceto'un-square'them,otherwiseknownassquareroot,thustheformulaforStandarddeviationbecomes
Let'sexplorewhattheStandarddeviationmeans.Typicaldatawillshowanormaldistribution(seefigure).Empiricalrule:Innormaldistribution,about68%ofvaluesarewithinonestandarddeviationofthemean,95%ofvaluesarewithintwostandarddeviationsofthemean,and99%ofthevaluesarewithinthreestandarddeviationsofthemean.StadarddeviationisnotedaseithersorSD.OftentheSDisplottedinbargraphsasanerrorbar,wewilltalkabouterrorbarsinabit.YouwillnotbeaskedtocalculateStadarddeviationontheAPexam,norshouldyouexpecttocalculateitbyhand(withacalculator)inthisclass.However,youwillbeaskedtomanipulatedatausingExcel.Solet'spractice.Gobacktothespreadsheetwiththetrichomedata.Youhavealreadycalculatedthemean(average)numberoftrichomes.1.Calculatethestadarddeviationforthisdataset.(=STDEV(A2:A101))andwriteitdown_______Betweenwhichtwovaluesare68%ofthedatalocated?__________and_____________Betweenwhichtwovaluesare95%ofthedatalocated?__________and_____________2.Atthebottomofthespreadsheetaretabs.Clickonthesecondtab"SecondGen".Thisisasecondsetofdataontrichomenumbers.Thestudentshaveobservedandcountedthetrichomesonthesecondgenerationofplants.Wewillnotdoanyhistogramsforthissecondsetofdata.3.CalculatetheMeanandthestandarddeviationforthe2ndGenofplants.Betweenwhichtwovaluesare68%ofthedatalocated?__________and_____________Betweenwhichtwovaluesare95%ofthedatalocated?__________and_____________6.Openthethirdtab"Analysis",LabelcolumnA1stGen,columnB2ndGen.Place(copyfromothertabsandpastespecial-values)intheappropriatecolumnstheMeanandSDforbothgenerationofplants.MarkinColumnCwhichrowiswhich.
Name__________________________
DessyDimova,Ph.D.BarnegatHighSchool
7
7.Thestudentsdidsomethingtothesecondgenerationofplants(nevermindwhat),andwewanttofindoutwhetherthereisadifferenceintrichomenumberbetweenthetwogenerations.MakeaBargraphplottingtheaveragetrichomenumberforeachgenerationofplants.8.Usingboththebargraph,whichcomparesthemeansofthetwogenerationsofplantsandthespreadofthevaluesyoucalculatedabove(68%ofdatalocatedbetween...)canyoudrawaconclusionwhetherthereisachangeintrichomenumbersinthe2ndgeneration?Justifyyouranswer.____________________________________________________________________________________________________________________________________________________________________________________________________HowcanwebecertainthatwhateverthestudentsdidinthatPlantexperimentresultedinhighernumbersoftrichomes?Orinthepreviousexamplewhenwemeasuredtheheightof200menformBarnegatand200menfromFalkenberg,Sweden,whatisourconfidencethatthe200menwehavesampledtrulyrepresenttheheightoftherespectivepopulationsofBarnegatandFalkenberg?Tohelpyoubetterunderstandthedifferencebetweensamplemeanandthetruemean(theactualmeanofthewholepopulation)andthesignificanceofsamplingandsamplingsize,wewilldoanactivitywhichwillinvolveseveralcomputersimulations,somecalculationsandsomegraphing.Directions:1.Gotohttp://www.hhmi.org/biointeractive/sampling-and-normal-distribution;clickontheintroductiontabandread2.Completeitems1-10onyourown.Shareyourindividualresults:specificallytheRangefromyourtableandwhatthedistributionslookedlike.3.Pairupandspendsometimediscussingandultimatelydecidingwhatanappropriatesamplesizewouldbe.Thenwriteitdown.Thesimulationcanrunwiththefollowingsamplesizes:491625100400and1000.Sincewehavealreadydonesamplesize4,wewillnowdotherestinitem11oftheWorksheet.YoucanfindthesamplesizethatisclosesttoyouridealsamplesizeorIcanassignyouasamplesize.Wewilldo#11and#12inpairs.Iwillprovideyouwithgridstoplotthesamplemeans.4.Compareclassdataanddiscussquestion13.For a population of infinite size, what sample size would be small enough to be feasible to collect data from while still giving you a good representation of the mean for the population? Explain why you selected this sample size. Recordyouranswerhere:______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________Upuntilnowwehavetalkedaboutdescriptivestatistics:mean,range,standarddeviation,etc.Butinsciencewewouldliketobeabletodrawaninference,todrawaconclusionifyouwillaboutthewholepopulationbasedonoursamplemeasurements.Thesamplemeanisnotnecessarilyidenticaltothemeanoftheentirepopulation(thetruemean).
Name__________________________
DessyDimova,Ph.D.BarnegatHighSchool
8
Youalreadydiscoveredthattheappearanceofahistogramdependsonthesamplesize,andthatwithsmallsamplesizesthedistributionmaynotbenormal,eventhoughwearesamplinganormallydistributedpopulation.Additionally,wesawthatthelargerthesamplesize,thebetterrepresentationwegetofthemeanofthepopulation.Unfortunatelyitisnotpracticaltohaveverylargesamplesizes,andoftensamplesizesarequitesmall.Soweeneedameasureofourconfidencethatthesamplemeanisagoodrepresentationofthetruemean(thepopulationmean).Everytimeyoutakeasampleandcalculateasamplemean,youwouldexpectaslightlydifferentvalue(yousawthisinthesimulation).Inotherwords,thesamplemeansthemselveshavevariability.Thisvariabilitycanbeexpressedbycalculatingthestandarderrorofthemean(abbreviatedasSEorSEM).Standarderrorisaninferentialstatistic,thismeanswecanuseittodrawinference,toinfer.Inferentialstatisticsgiveustheconfidence(theoppositebeinguncertainty)thatasamplemeanisarepresentationofthetruemean;andtheycanalsousedtoinferdifferencesbetweenmeansoftwodifferentpopulations(remembertheplantsandthemen).Confidence(Uncertainty)isusuallyrepresentedaserrorbarsongraphs,forinstanceinthegraphyouconstructedontrichomedensity,eachbarshouldhaveanerrorbar(+/-;alineaboveandalinebelowthemean)indicatingtherangeinwhichwecansaywith95%confidencethatthetrue(population)meanlies.Butmoreonerrorbarslater.RightnowwewillexploretheinferentialstatisticscalledStandarderrorofthemean.
GotoPartIIofyourBiointeractiveWS.Onthesimulationclickthearrowtomovetothenextmodule-StandardErroroftheMean.Completeitems1-11,DONOTfilloutthelasttwocolumnsofTable3.
AnotherinferentialstatisticsistheConfidenceinterval,orCI,typicallywetalkabout95%CI.95%CIisarangecenteredaroundthesamplemean.Thisisarangeofvaluesyoucanbe95%confidentcontainsthetruemean.Whenscientistswanttocomparethemeansoftwodifferentpopulationstheywillplotthemeanofeachpopulationwitherrorbars.Theerrorbarsgivearange,thecanbe+/-SEor-/+2xSEor95%CI.Let'sbrieflyexaminethem.SEbars-SE(SEM)isanestimationofthestandarddeviationofthemean(thetruemean).InthesimulationyousawthatwhensamplesizeislargetheSEascalculatedbytheformulaaboveisveryclosetotheStandarddeviationofthemean.Wecan'tdoanexperiment100or400timeslikeinthesimulation,butwecanestimatetheStandarddeviationofthemean(thetruemean).ForlargesamplesizesthecalculatedSEisagoodapproximationoftheSDoftheMeanofthepopulationandassuchwecanapplytheempiricalrule(seep.6)andsaythat95%ofthevaluesarewithintwostandarddeviationsofthemean,orinthiscase2SEM,soforlargesamplesizes95%CI=Mean+/-2xSEM.Oftenyouwillsee2xSEMbarsand95%CIbarsusedinterchangibly.IfyouencounterMean+/-SEM(singleSEMbar),allyouneedtodoisvisuallydoubletheerrorbartogetthe95%CI.CI(confidenceintervals)bars-arethebest,butunfortunatelyrarelyusedbybiologists.Confidenceintervalsarecalculated,thereisastatisticalformula.TheyaredependentontheSEM,thesamplesizeandsomethingelse.Wewill
notdiscusshowCIiscalculated,butitisimportanttorealizethatdoublingSEMisonlyanapproximationofthetruevalueofCI.Ifsamplesizeis>or=10,then2xSEM-95%CI.However,forsmallersamplesizesthisisnotthecase.Ifn=3,weneedtomultiplySEMby4toget95%CI;forn=5wemultiplyby2.5,etc.Lookatthepictureontheright.Thisissimilartowhatyousawinthesimulation,butlet'simaginea"reallife"situation.Imaginethatthegovernmenthasmanagedtomeasuretheheightofevery4yearoldboywhocurrentlylivesintheUS(thisisnotpractical,butitisnotimpossible).Theaverageheightfor4yearoldboysintheUSis40inches,sotheGovernmentknowsthetruemeanofthepopulationbecausethataveragewasderivedfrommeasuringEVERYboy.
Name__________________________
DessyDimova,Ph.D.BarnegatHighSchool
9
However,thegovernmentisnotsharingthedata,insteadPediatriciansrelayontheirownmeasurements.Sincenoonepediatriciancanmeasureallboys,theyrelayonsamplingthepopulationtodeterminewhatisanaverageheightfor4yearolds.20Largestudiesweredonealloverthecountrymeasuringtheheightof400boyseachanddeterminingmeans,SD,SEandCI.The20differentmeansandthe95%CIarerepresentedinthefigure.Thedotsarethemeans,andthelines(errorbars)arethe95%CI,therangeinwhichweexpecttofindthetruemean19/20times.Inthefigureyouwillseethatinthe18outofthe20experiments,the95%CIcapturedthetruemean,2didn't(opencircles).InotherwordsthemeanofthedatawithSEorCIbarsgivesanindicationoftheregionwhereyoucanexpectthemeanofthewholepopulation,butdespitethe95%confidenceyoumaystillendupnotcapturingthetruemeans.ErrorBars:ThetablebelowsummarizescommonerrorbarsusedinBiology.
Note:YouareNOTexpectedtoknowtheseformulas;theyaregiventoyousoyoucanunderstandwhattheerrorbarsmean.Mostproblemsyouwillencounterwillbeusing2xSEM,justbeawarethatifthesamplesizeissmall,youwillneedtovisuallyextendtheerrorbars(~3-4xSEM).Youmayseegraphswith+/-SD,wecan'tusethemtoinferaboutthepopulationmeanortocomparetwodifferentpopulationmeans,SDisadescriptivestatisticsandsoareSDerrorbars,theysimplyshowthespread,thevariationofthedatacollected.However,ifthedescriptionoftheexperimentincludesasamplesize(n),youcancalculatetheSEfromthegivenSD.Let'sreturntooursampletrichomedata.You'vecalculatedthemeanandtheSD.1)CalculatetheSEMforbothgenerationsusingtheformulaabove,andplacecalculationsintherowunderyourSD.2)Calculatethe2xSEMvalueandplaceunderyourSEMcalculations.3)Add=/-2xSEMerrorbarstoyourgraph.(Iwillnowshowyouhowtoadderrorbarstoyourgraph,stronglyencourageyoutotakenotes)Sinceoursamplesizeis100,wecanbesurethat2xSEM=95%CI.Now,thatwehaveaddeda95%Confidenceinterval,doyourconclusionsaboutthedatachange?Aremoreorlessconfidentinyouranalysis?
Name__________________________
DessyDimova,Ph.D.BarnegatHighSchool
10
_______________________________________________________________________________________________So,howdoweinterpreterrorbars?Whenwecomparetwodifferentpopulationsandwanttomakeastatementthattheydifferordonotdifferintheirmeans,errorbarscomeinhandy.Payattentionasyouwillbeexpectedtointerpretexperimentalresultsbasedonerrorbars.1.Themeansofthetwopopulationsaredifferentandtheirerrorbars(2xSEMor95%CI)donotoverlap,nordotheyoverlapthemean,thisisprettystrongevidencethatthereisadifferencebetweenthepopulations.2.Iftheerrorbarsoverlapjustalittlebit,butdonotoverlapthemeans,youdonothavestrongevidenceeitherway.Ifonlythistypeofdataispresented,youcansaythatthedataisinconclusive.3.Whentheerrorbarsoverlapbotheachotherandthemeans,thenmostlikethereisnodifferencebetweenthepopulations.Takehomelesson#1:Statisticsisaguide.Itdoesnotproveordisproveanything.Itguidesusintheinterpretationofdata.Tobe'certain'scientistswillcollectmultiplelinesofevidence(performmultipledifferentexperiments,testtheirpredictionsinmultipledifferentways,independentscientistwillcheckeachother,etc.)andalsotrytomakebiologicalsenseoftheirdata(linktowhatisknown).Wewilltalkmoreaboutstatisticallysignificantdifferencesandhypothesistestinglateron.InordertopracticeusingExcelspreadsheettoanalyzedata,completethetutorialslistedbelow.http://www.hhmi.org/biointeractive/spreadsheet-data-analysis-tutorials?fref=gc