a note to teachers - life sciences rlhs · williamson and jen pfannerstill, paul k. strode, valerie...

12
1 A Note to Teachers This statistics primer was designed with my students in mind; it may not be appropriate for your students. I made it because my incoming class will consist mostly of students who are very behind in their math skills and are enrolled in lower level math classes. It may not be appropriate to higher- level students. I have tried to put it together in a way to explain statistics intuitively without too much math (the way I wish someone had taught me statistics), so that I can lay the foundation to build up their quantitative reasoning skill. I have them do some practice with data in Excel, so they can get comfortable - data is from Brad Williamson's Artificial selection lab; you can find it under the resource section of the AP Training Module on quantitative methods. All I did is copy his data into a new spreadsheet, just the data not the graphs, I want them to practice making graphs. I also use the HHMI Click and Learn Resource: Sampling and Normal distribution. This is only my module1, in module 2 (still in the works) I plan on introducing statistical hypothesis testing. I plan on using Module 1 in the beginning of the year, Bbut to keep going back to it and to incorporate as much data analysis and statistics as possible. In addition to doing labs and analyzing their own data, we will use other scientists' data to practice from sources like DataNuggets; HHMI Data Points and some scientific articles. Below you will find some links that I thought are very useful as well as references I have used to put this primer together. I am not trying to re-invent the wheel; great teachers before me have build it, just to adjust it to the needs of my students. This would have not seen the light of day if it wasn't for Brad Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free to use any or all of it, but please give credit where credit is due. Dessy Dimova, Ph. D.; Barnegat High School, Barnegat, NJ References and helpful links: Wonnacott, R J, and T H Wonnacott. Introductory Statistics. John Wiley & Sons, Inc., 1969. Williamson, B, and J Pfannerstill. “AP Biology Quantitative Methods: An Introduction to Discriptive Statistics.” APBiology-QuantitativeMethods-Final-v3, College Board, cb.collegeboard.org/ap-training- modules/ap-biology/quantitative-methods/story_html5.html. Cumming, Geoff. “Error Bars in Experimental Biology.” The Journal of Cell Biology, vol. 177, no. 1, 9 Apr. 2007, pp. 7–11. JSTOR, www.jstor.org/stable/10.2307/30049848?ref=search- gateway:4ea19d3397e6165bff89292b29e5a97f. Forthofer, Ronald N., and Eun Sul Lee. Introduction to Biostatistics: a Guide to Design, Analysis and Discovery. Academic Press, 1995. Krzywinski, Martin, and Naomi Altman. “Points of Significance: Error Bars.” Nature News, Nature Publishing Group, 27 Sept. 2013, www.nature.com/nmeth/journal/v10/n10/full/nmeth.2659.html. Stawiery, Crystal Jenkins. Properties of Water Lab with Stats.

Upload: others

Post on 24-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

1

ANotetoTeachersThisstatisticsprimerwasdesignedwithmystudentsinmind;itmaynotbeappropriateforyourstudents.Imadeitbecausemyincomingclasswillconsistmostlyofstudentswhoareverybehindintheirmathskillsandareenrolledinlowerlevelmathclasses.Itmaynotbeappropriatetohigher-levelstudents.Ihavetriedtoputittogetherinawaytoexplainstatisticsintuitivelywithouttoomuchmath(thewayIwishsomeonehadtaughtmestatistics),sothatIcanlaythefoundationtobuilduptheirquantitativereasoningskill.IhavethemdosomepracticewithdatainExcel,sotheycangetcomfortable-dataisfromBradWilliamson'sArtificialselectionlab;youcanfinditundertheresourcesectionoftheAPTrainingModuleonquantitativemethods.AllIdidiscopyhisdataintoanewspreadsheet,justthedatanotthegraphs,Iwantthemtopracticemakinggraphs.IalsousetheHHMIClickandLearnResource:SamplingandNormaldistribution.Thisisonlymymodule1,inmodule2(stillintheworks)Iplanonintroducingstatisticalhypothesistesting.IplanonusingModule1inthebeginningoftheyear,Bbuttokeepgoingbacktoitandtoincorporateasmuchdataanalysisandstatisticsaspossible.Inadditiontodoinglabsandanalyzingtheirowndata,wewilluseotherscientists'datatopracticefromsourceslikeDataNuggets;HHMIDataPointsandsomescientificarticles.BelowyouwillfindsomelinksthatIthoughtareveryusefulaswellasreferencesIhaveusedtoputthisprimertogether.Iamnottryingtore-inventthewheel;greatteachersbeforemehavebuildit,justtoadjustittotheneedsofmystudents.Thiswouldhavenotseenthelightofdayifitwasn'tforBradWilliamsonandJenPfannerstill,PaulK.Strode,ValerieBolsterMay,PamClose,AnnBrokaw,RayanReardontonamejustafew.Feelfreetouseanyorallofit,butpleasegivecreditwherecreditisdue.DessyDimova,Ph.D.;BarnegatHighSchool,Barnegat,NJReferencesandhelpfullinks:Wonnacott,RJ,andTHWonnacott.IntroductoryStatistics.JohnWiley&Sons,Inc.,1969.Williamson,B,andJPfannerstill.“APBiologyQuantitativeMethods:AnIntroductiontoDiscriptiveStatistics.”APBiology-QuantitativeMethods-Final-v3,CollegeBoard,cb.collegeboard.org/ap-training-modules/ap-biology/quantitative-methods/story_html5.html.Cumming,Geoff.“ErrorBarsinExperimentalBiology.”TheJournalofCellBiology,vol.177,no.1,9Apr.2007,pp.7–11.JSTOR,www.jstor.org/stable/10.2307/30049848?ref=search-gateway:4ea19d3397e6165bff89292b29e5a97f.Forthofer,RonaldN.,andEunSulLee.IntroductiontoBiostatistics:aGuidetoDesign,AnalysisandDiscovery.AcademicPress,1995.Krzywinski,Martin,andNaomiAltman.“PointsofSignificance:ErrorBars.”NatureNews,NaturePublishingGroup,27Sept.2013,www.nature.com/nmeth/journal/v10/n10/full/nmeth.2659.html.Stawiery,CrystalJenkins.PropertiesofWaterLabwithStats.

Page 2: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

2

Sabo,DavidW.“ProbabilityandStatisticsforBiologicalSciences.”TypesofData,commons.bcit.ca/math/faculty/david_sabo/apples/math2441/section2/typesofdata.htm.Helpfullinks:HHMIClickandLearnusedintheprimer,studenthandoutcanbedownloadeddirectlyfromHHMI:http://www.hhmi.org/biointeractive/sampling-and-normal-distributionTutorialsfromHHMIonusingspreadsheets:http://www.hhmi.org/biointeractive/spreadsheet-data-analysis-tutorials?fref=gcBradWilliamson-ArtificialSelectionanalysiswithExcel:https://www.youtube.com/watch?v=5ggSWuEzxeM&feature=youtu.befromChrisChou:https://www.ted.com/.../mona_chalabi_3_ways_to.../up-next...fromPamClose:https://www.nature.com/collections/qghhqmFromJenPfannerstill-excellentarticleonerrorbarsinBiology:http://jcb.rupress.org/content/177/1/7FromBradWilliamsonandLaTanyaSharpe:http://media.collegeboard.com/.../AP_Bio_Quantitative...FromJenPfannerstillandBradWilliamson:https://apcentral.collegeboard.org/.../professional...FromHHMI.ValerieBolsterMayandAnnBrokawfrequentlydotalksonthis:http://www.hhmi.org/bio.../teacher-guide-math-and-statisticsFromPaulKStrode:https://www.fairviewhs.org/.../ib-ap-biology-ii/folders/4411FromRayanReardon:https://jcibapbiology.wordpress.com/g_statistics-resources/3waystospotabadstatisticSometimesit'shardtoknowwhatstatisticsareworthyoftrust.Butweshouldn'tcountoutstatsaltogether...…TED.COM

Page 3: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

1

StatisticsforAPBiology

Module1:IntroductiontoStatisticsIntroductiontoStatistics: Statisticalmethodsincludeproceduresforcollectingdata,presentingandsummarizingdata,anddrawinginferencesfromsampledata(wecaninferfromacollectedsamplesizeofdatawhatisoccurringinthegeneralpopulation).WatchMr.Anderson(until5:33only):http://www.bozemanscience.com/statistics-for-scienceSinceitisnotpracticaltocharacterizethewholepopulation,wesample.Theprimarypurposeofstatisticsistomakeaninferencetothewholepopulationfromasample.Beforewecandothatweneedtocharacterizethesampleusingafewdescriptivenumbers,theyarealsocalleddescriptivestatistics.WatchtherestofMr.Anderson'svideo.

TypesofDataThereisthedistinctionbetweenqualitativedataandquantitativedata:thetermqualitativecomesfromtheword"quality",indicatingaproperty,characteristic,featureorattribute.Qualitativedataisalwaysalistofwordsornamesofacharacteristic.Examplesofqualitativevariables(whichhavequalitative"values")aretheflavoroficecream,thecolorofaperson'seyesorhair,thespeciesofaselectedlifeform,thebrandofpotatochip,etc.Thetermquantitativecomesfromtheword"quantity",indicatingamount,measure,number,size,etc.Quantitativedataisalwaysalistofnumericalvalueswherethenumbersaremorethanjustnames,butactuallyrepresentmeasurednumericalvalues.ExamplesofquantitativevariablesthatmightbeconsideredinstudyingthepopulationofBHSstudentsaretheheightofastudent,theageofastudent,thenumberofapplesthestudentateinthepastweek.However,thestudentIDnumberisaqualitativevariableratherthanaquantitativevariable,sinceitisinsomewayequivalenttoanameforthatstudent.Sometimesnumericaldigitsareusedtorepresentqualitativevalues.Thus,theplayersonasportsteamoftenhavenumbersontheirshirts,butthesenumbersarequalitativelabels,notquantitativevalues.Similarly,statisticianssometimescodequalitativevalueswithnumericaldigits--forexample,lettingthenumericaldigits0and1standforthequalities"male"and"female",respectively.Qualitativedatacanbe"turned"intoquantitativedata.Forexamplewhenweuseascaleof1-5torepresenttherangeofresponsestoquestionsfrom"stronglydisagree"to"stronglyagree"onsurvey-typequestionnaires,onecouldregardtheresultasqualitative(ie.,oneofthelistof"stronglydisagree","disagree","noopinion","agree"or"stronglyagree")orasqualitative(thevalues1,2,3,4,and5measuringthedegreeofagreementwiththestatementgiven).

Page 4: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

2

Quantitativedatawecollectcanbeeitherdiscreteorcontinuous.Hereisanexampleofdiscretedata-wetossadice50timesandrecordthenumberthatcomesup(from1to6),thiswillyieldastringof50numbers.Thefirststeptoorganizingthedataisafrequencytable.Wetallythenumberoftimeswegota1,thenumberoftimeswegota2,ansoon.Wecanthencalculatetherelativefrequency,forinstancewerolleda1atotalof9times,sothefrequencyis9/50=0.18.Thisinformationcanbegraphedandwillyieldthefrequencydistribution-thatishowfrequenteachnumberwasrecorded(rolled).TheX(numberondice)iscalledadiscretevariable,becauseithasfinitenumberofvalues(havinglimits,countable).Continuousexample:Supposeyoudrawasampleof200menfromacertainpopulationandrecordtheirheightininches.Theultimateaimwouldbetoinfertheaverageheightofmalesofthewholepopulation.Inthisexample,Xourvariableiscontinuous,anindividual'sheightmightbeanyvalue,suchas64.328inches.Itnolongermakessensetotalkaboutthefrequencyofthisspecificvalue,chancesarewe'llnotobserveagainsomeonewhoisexactly64.328inchestall.Insteadwegroupourdataintobins,orclasses,eachbin/classspansaspecificrangeofvalues(e.g.from58.5to60.5inches).Thuswewillgroupallmeasurementsthatfallbetween58.5and60.5inchesintooneclass/binandsoon.Wewillthentallythenumberofmeasurements(values)ineachbin/classandgetthefrequencyofheightswithinaclassthatincludesspecificrangeofvalues.Weplotthefrequencyofeachbininagraphcalledahistogram.Avisualillustrationoftheusefulnessofdoingthisisshownbelow. InthisexperimentAPBiostudentswereworkingwithFastPlantsandmeasuringthenumberofTrichomes,thehairsonthepetioleoffirsttrueleaf,of100plants.TheyputtheirdataintoanExcelspreadsheetandplottedit.Bellowyouwillseagraphrepresentingtheirdata.Agraphissupposedtohelpusvisualizeourdataanddrawconclusionsbasedonthedatapresented.Lookatthegraphbelow.Whatcanyousayaboutthenumberoftrichomesofthe100plants?Doyouseeanypatterns,trends,somethingthatjumpsoutatyou?

__________________________________________________________________________________________________________________________________________________________________________________________________Doyouthinkthisvisualrepresentationofthestudents'dataisuseful?____________________________________________________________________________________________

0

5

10

15

20

25

30

35

40

1 7 13

19

25

31

37

43

49

55

61

67

73

79

85

91

97

TrichomesperplantintheParentGen.

TrichomesperplantintheParentGen.

Page 5: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

3

Whatifthestudentswantedtodeterminetheaveragenumberoftrichomesandhowmanyplantstherearethatare"abnormally"hairy?InthegraphbelowthestudentsconstructedaHistogram,byplacingtheirmeasurementsinseveralbins,eachbin-agroupofplantswithinarangeofnumberoftrichomes.

Draw2conclusionsfromtheHistogram,somethingthatwasn'tobviousinthefirstgraph.________________________________________________________________________________________________________________________________________________________________________________________________

________________________________________________________________________________________________Lookingatthehistogram,canyoucomeupwithonequestiontoaskregardingthepopulationofplants,somethingthatcanbeansweredwithexperimentation?___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________DiscriptiveStatistics-thetoolsweusetodescribe(charecterize)asampleAFrequencyTableandaFrequencyDistribution(thehistogram)arethefirststeptoorganizingquantitativedatasothatwecancharacterizeit,describeit.Youarealreadyfamiliarwiththediscriptivestatistics-mean,mode,median,rangeandevenstandarddeviation,butwewillintroducethemherealittledifferently.Instatisticsweusetwodescriptionsofthefrequencydistribution:1)theCentralpointfothedistributionand2)it'sspread1.Center,centralpoint-severaldifferentconceptsofthe"Center":Mode,MedianandMeanMean-average;Mode-themostfrequentvalue;Median-midlevalueofanorderedsetofvaluesLet'sexaminethefollowinghypotheicalfrequencydistribution.TheXvaluesrangefrom1to7.Theaverage/meanis4.

Youcanseethat4isinexactcenterofthehistogram.4alsohasthehighestbar,whichmeansitisthemostfrequentvalueormode.Anditsohappensthat4isalsointhemiddleoftheorderedsetofvalues(1;2;3;4;5;6;7),so4isthemedian.Inthishypotheticalhistogram,thecenteristhemean,modeandmedian.Soall3conceptsofcenteroverlappperfectly.Thishistogram/frequencydistributioniscalledthenormaldistribution,youmayhaveseenitasabell-shapedcurve(Youwouldgetabellshapedcurveifyoudrawaline,connectingthemiddleofthetops

051015202530

FirstGeneraSonTrichome,Sept.2013

Page 6: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

4

ofeachbar).Anormaldistributionhasasinglepeakandissymmetrical.Notalldatashowsnormaldistribution,wewillnotconcernourselvestodaywithothertypesofdistributions,butitisimportnatforyoutounderstandthatfrequncydistributionsandtheshapeofhistogramsvary.Belowyouwillfindsomeexamples:

MostpeopleuseExceltoorganize,characterize,andvisualizedataaswellastodocalculations.DifferentversionsofExcel,especiallythosethatworkondifferentcomputersandOSs,areslightlydifferent.Thesedifferencesareenoughtofrustratetheuseriftheydon'tdothisfrequently.ThisexerciseisdesignedtoshowyouhowyoucanfigureoutonyourownhowtousefeaturesofExcelandhowtogethelpwithdifferentfunctions.Let'spracticemakingourownhistogramanddeterminingthemean,modeandmedianofasetofsampledata:1.OpentheexcelspreadsheetTrichomedata.xlx.ThespreadsheetcontainstherawdataoftheAPBiostudents'trichomeexperimentwetalkedaboutabove.2.LabelColumnsI,JandKwithMean,MedianandModerepectively;thencalculatethemean,medianandmode.Todothistypetheequalsignintheappropriatecell,thisbringsuptheformulabar;selecttheAVERAGE(formean),MedianorMode;thenselectallofthedata(A2:A101)andhitenter;thiswillreturntheresultofthecalculation3.CopyandpasteallthevaluesfromcolumnAintoColumnB(labelColumnB-ordereddata);selectallofthedatainColumnBandusetheSortfunction(Sort-Asceding-usecolumnheaders).Thiswillsortthevaluesandhelpyoudeterminethebinsyouaregoingtouseforthehistogram.Whatishesmallestvalue_________Whatisthelargestvalue_________WhatistheRangeofthedata________4.Decidehowmanybinsandwhattherangeofthebinswillbe.IncolumnE(labelasBins)putthehighnumberoftherangeofeachofyourbins,e.g.binwithrange6-10wouldread10.labelcolumnFasfrequencyornumber,thiswillbeyourtally,e.g.howmanyplantshadbetween6and10trichomes.Youcanpeakbackandseewhatthestudentsdidoryoucanuseyourownmethodfordecidingwhatbinstomake.5.UseExcelHelpandtypeinFrequency,thiswilltakeyoutotheinstructionshowtousethisformula(Iwillhelpyouaswell).Inthefirstcell,F2,typein=Frequency(A2:101;E2:E10).Thefirstsetinthe()tellsthefunctionwherethedataisthatneedstobesortedandcounted;thesecondsetin()tellsthefunctionwherethebinsareandwhatrangetheyhave.Intheaboveexampleyou'vemade9bins,eachisincellsE2-E10.Notetheformulamustbeenteredasanarrayformula.HowthisisdonedependsontheversionofExceland/orcomputerOS,instructionsareinthehelpsectionfortheFrequencyFunction.6.ColumnFshouldnowcontainthefrequenciesorourYvalueforthehistogram.OurXvalueswoudbetheactualbins,itisconvinienttolabelthewiththeactualrange,notjusttheactualhighnumber,e.g.labelis6-10NOTjust10.Copyandpasteinaseparatecolumntherangesyiuusedtomakethebins,inthecolumnnexttoitcopy->pastespecial(values)thenumbersfromcolumnF.selectthedatafrombothcolumnsandclickinsertchart,choosebar/columntypeofgraph.7.Youhavenowmadeahistogramthathopefullylookssimilartotheoneabove.Notewheronthehistogramthe3Centerpointsyoucalculateearlierfall-whereisthemean,modeandmedianonthehistogram?

Page 7: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

5

_____________________________________________________________________________________________Dependingonhowyoumadeyourbinsyourhistogramsmaylookslightlydifferent,butonethingisclear,thehistogramdoesnotlookliketheexampleonethatwasperfectlysymetrical.Inrealexperimentationdatararelylooksthisperfect,especiallywhenitiscontinousdata(ifyoudon'trememberwhatthatmeans,scrolllup).TheMean,medianandmodedon'toverlapperfectly,butarecloseenoughifdatafollowsnormalldistribution.2.Spread-Deviationsormeasuresofthespread:Thespreadofadistributionreferstothevariabilityofthedata.Iftheobservationscoverawiderange,thespreadislarger.Iftheobservationsareclusteredaroundasinglevalue,thespreadissmaller.Considerthefiguresbelow.TheMean,modeandmedianvalueinbothcasesis5.Inthefigureontheleft,datavaluesrangefrom3to7;whereasinthefigureontheright,valuesrangefrom1to9.Thefigureontherightismorevariable,soithasthegreaterspread.

Ifwedetermineonlythemean(ormedianandmode)bothdatasetslookidentical,buttheyaren't,someasuresofthespreadareanotherveryimportantsetofdescriptivestatistics.Thesimplestoftheseyouarealreadyfamiliarwith-therange.Range=largest-

smallestvalue.Fortheexamplesabove,therangetellsusthatthefigureontherighthasagreaterspread.However,asalreadydiscussed,realfrequencydistributiondatararelylookssoperfectlysymmetrical.AmoreusefulmeasureofthespreadandtheonewidelyusedbyscientistsistheStandarddeviation.Intheexampleabove,ontheleftdatapointsareclosertothemeanthaninthefiguretotheright,sotheleftdatawillhaveasmallerstandarddeviation.Hereistheformula(noneedtomemorize):Let'sexaminewhatitmeans.Xiisanysinglemeasurmentyouhavecollected,

X isthemean.Thediviationofeachobservedvaluefromthemeanis

Xi − X .Thismeasureshowdifferenteachobservedvalueisfromthemean.Theaveragedeviationthenwouldbetosumalltheindividualdeviationsandthendividebyn,wherenisthenumberofindependentmeasurements.Takeamomentandcalculatethemeandiviationofthedatadisplayedonthehistogramontheleftintheabovefigure.Rememberthevaluesrangefrom3to7andthemeanis5.Whatnumberdidyouget?Isthismeasureusefulindetermininghowvariedthedatais?Explain.________________________________________________________________________________________________Ifwetakethesquaredvalueofeachdiviation,wewouldalwaysgetapostitivenumber,wecanthendeterminethe

averagesquareddeviation.

Xi − X( )∑n

2

.The

∑ signsimplymeanssum,weaddupallthedeviations.

Thismeasureisusefulindescribingthevariationofoursample.Butwhydowecareaboutthespread,thevariationinoursamples?Considertheexamplewediscussedearlierinwhichwemeasuretheheightof200men.Knowingwhatthespreadisisgoingtotelluswhetherthereareanyextremelyshortmenand/orwhethertheyrearesomeextremelytallmeninthispopultation.Whatifwemeasuretheheightof200menfromBarnegat(population20,936)andthen

Page 8: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

6

x

measuretheheightof200meninFalkenberg(population20,035),atowninSweden.WewouldliketomaketheclaimthatmenfromFalkenberg,SwedenaretallerthanmenfromtheBarnegat.Ifwesimplyusetheaveragesquareddeviationwewoulddescribebothsamples.However,wewouldliketoinferfromthesetwosamplestheaverageheightofthewholepopulation,wewouldwanttomakeastatisticalinferenceaboutthetwopopulationsandiftheytrulydiffer.Tobeabletodothisweneedtodividethesumofalldeviationsbyn-1,notn.n-1arethedegreesoffreedom,

rememberMr.Anderson'sexplanationinthevideoyouwatched.Sonowwehave

Xi − X( )∑n −1

2

.Becausethe

deviationsaresquared,itwouldbeniceto'un-square'them,otherwiseknownassquareroot,thustheformulaforStandarddeviationbecomes

Let'sexplorewhattheStandarddeviationmeans.Typicaldatawillshowanormaldistribution(seefigure).Empiricalrule:Innormaldistribution,about68%ofvaluesarewithinonestandarddeviationofthemean,95%ofvaluesarewithintwostandarddeviationsofthemean,and99%ofthevaluesarewithinthreestandarddeviationsofthemean.StadarddeviationisnotedaseithersorSD.OftentheSDisplottedinbargraphsasanerrorbar,wewilltalkabouterrorbarsinabit.YouwillnotbeaskedtocalculateStadarddeviationontheAPexam,norshouldyouexpecttocalculateitbyhand(withacalculator)inthisclass.However,youwillbeaskedtomanipulatedatausingExcel.Solet'spractice.Gobacktothespreadsheetwiththetrichomedata.Youhavealreadycalculatedthemean(average)numberoftrichomes.1.Calculatethestadarddeviationforthisdataset.(=STDEV(A2:A101))andwriteitdown_______Betweenwhichtwovaluesare68%ofthedatalocated?__________and_____________Betweenwhichtwovaluesare95%ofthedatalocated?__________and_____________2.Atthebottomofthespreadsheetaretabs.Clickonthesecondtab"SecondGen".Thisisasecondsetofdataontrichomenumbers.Thestudentshaveobservedandcountedthetrichomesonthesecondgenerationofplants.Wewillnotdoanyhistogramsforthissecondsetofdata.3.CalculatetheMeanandthestandarddeviationforthe2ndGenofplants.Betweenwhichtwovaluesare68%ofthedatalocated?__________and_____________Betweenwhichtwovaluesare95%ofthedatalocated?__________and_____________6.Openthethirdtab"Analysis",LabelcolumnA1stGen,columnB2ndGen.Place(copyfromothertabsandpastespecial-values)intheappropriatecolumnstheMeanandSDforbothgenerationofplants.MarkinColumnCwhichrowiswhich.

Page 9: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

7

7.Thestudentsdidsomethingtothesecondgenerationofplants(nevermindwhat),andwewanttofindoutwhetherthereisadifferenceintrichomenumberbetweenthetwogenerations.MakeaBargraphplottingtheaveragetrichomenumberforeachgenerationofplants.8.Usingboththebargraph,whichcomparesthemeansofthetwogenerationsofplantsandthespreadofthevaluesyoucalculatedabove(68%ofdatalocatedbetween...)canyoudrawaconclusionwhetherthereisachangeintrichomenumbersinthe2ndgeneration?Justifyyouranswer.____________________________________________________________________________________________________________________________________________________________________________________________________HowcanwebecertainthatwhateverthestudentsdidinthatPlantexperimentresultedinhighernumbersoftrichomes?Orinthepreviousexamplewhenwemeasuredtheheightof200menformBarnegatand200menfromFalkenberg,Sweden,whatisourconfidencethatthe200menwehavesampledtrulyrepresenttheheightoftherespectivepopulationsofBarnegatandFalkenberg?Tohelpyoubetterunderstandthedifferencebetweensamplemeanandthetruemean(theactualmeanofthewholepopulation)andthesignificanceofsamplingandsamplingsize,wewilldoanactivitywhichwillinvolveseveralcomputersimulations,somecalculationsandsomegraphing.Directions:1.Gotohttp://www.hhmi.org/biointeractive/sampling-and-normal-distribution;clickontheintroductiontabandread2.Completeitems1-10onyourown.Shareyourindividualresults:specificallytheRangefromyourtableandwhatthedistributionslookedlike.3.Pairupandspendsometimediscussingandultimatelydecidingwhatanappropriatesamplesizewouldbe.Thenwriteitdown.Thesimulationcanrunwiththefollowingsamplesizes:491625100400and1000.Sincewehavealreadydonesamplesize4,wewillnowdotherestinitem11oftheWorksheet.YoucanfindthesamplesizethatisclosesttoyouridealsamplesizeorIcanassignyouasamplesize.Wewilldo#11and#12inpairs.Iwillprovideyouwithgridstoplotthesamplemeans.4.Compareclassdataanddiscussquestion13.For a population of infinite size, what sample size would be small enough to be feasible to collect data from while still giving you a good representation of the mean for the population? Explain why you selected this sample size. Recordyouranswerhere:______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________Upuntilnowwehavetalkedaboutdescriptivestatistics:mean,range,standarddeviation,etc.Butinsciencewewouldliketobeabletodrawaninference,todrawaconclusionifyouwillaboutthewholepopulationbasedonoursamplemeasurements.Thesamplemeanisnotnecessarilyidenticaltothemeanoftheentirepopulation(thetruemean).

Page 10: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

8

Youalreadydiscoveredthattheappearanceofahistogramdependsonthesamplesize,andthatwithsmallsamplesizesthedistributionmaynotbenormal,eventhoughwearesamplinganormallydistributedpopulation.Additionally,wesawthatthelargerthesamplesize,thebetterrepresentationwegetofthemeanofthepopulation.Unfortunatelyitisnotpracticaltohaveverylargesamplesizes,andoftensamplesizesarequitesmall.Soweeneedameasureofourconfidencethatthesamplemeanisagoodrepresentationofthetruemean(thepopulationmean).Everytimeyoutakeasampleandcalculateasamplemean,youwouldexpectaslightlydifferentvalue(yousawthisinthesimulation).Inotherwords,thesamplemeansthemselveshavevariability.Thisvariabilitycanbeexpressedbycalculatingthestandarderrorofthemean(abbreviatedasSEorSEM).Standarderrorisaninferentialstatistic,thismeanswecanuseittodrawinference,toinfer.Inferentialstatisticsgiveustheconfidence(theoppositebeinguncertainty)thatasamplemeanisarepresentationofthetruemean;andtheycanalsousedtoinferdifferencesbetweenmeansoftwodifferentpopulations(remembertheplantsandthemen).Confidence(Uncertainty)isusuallyrepresentedaserrorbarsongraphs,forinstanceinthegraphyouconstructedontrichomedensity,eachbarshouldhaveanerrorbar(+/-;alineaboveandalinebelowthemean)indicatingtherangeinwhichwecansaywith95%confidencethatthetrue(population)meanlies.Butmoreonerrorbarslater.RightnowwewillexploretheinferentialstatisticscalledStandarderrorofthemean.

GotoPartIIofyourBiointeractiveWS.Onthesimulationclickthearrowtomovetothenextmodule-StandardErroroftheMean.Completeitems1-11,DONOTfilloutthelasttwocolumnsofTable3.

AnotherinferentialstatisticsistheConfidenceinterval,orCI,typicallywetalkabout95%CI.95%CIisarangecenteredaroundthesamplemean.Thisisarangeofvaluesyoucanbe95%confidentcontainsthetruemean.Whenscientistswanttocomparethemeansoftwodifferentpopulationstheywillplotthemeanofeachpopulationwitherrorbars.Theerrorbarsgivearange,thecanbe+/-SEor-/+2xSEor95%CI.Let'sbrieflyexaminethem.SEbars-SE(SEM)isanestimationofthestandarddeviationofthemean(thetruemean).InthesimulationyousawthatwhensamplesizeislargetheSEascalculatedbytheformulaaboveisveryclosetotheStandarddeviationofthemean.Wecan'tdoanexperiment100or400timeslikeinthesimulation,butwecanestimatetheStandarddeviationofthemean(thetruemean).ForlargesamplesizesthecalculatedSEisagoodapproximationoftheSDoftheMeanofthepopulationandassuchwecanapplytheempiricalrule(seep.6)andsaythat95%ofthevaluesarewithintwostandarddeviationsofthemean,orinthiscase2SEM,soforlargesamplesizes95%CI=Mean+/-2xSEM.Oftenyouwillsee2xSEMbarsand95%CIbarsusedinterchangibly.IfyouencounterMean+/-SEM(singleSEMbar),allyouneedtodoisvisuallydoubletheerrorbartogetthe95%CI.CI(confidenceintervals)bars-arethebest,butunfortunatelyrarelyusedbybiologists.Confidenceintervalsarecalculated,thereisastatisticalformula.TheyaredependentontheSEM,thesamplesizeandsomethingelse.Wewill

notdiscusshowCIiscalculated,butitisimportanttorealizethatdoublingSEMisonlyanapproximationofthetruevalueofCI.Ifsamplesizeis>or=10,then2xSEM-95%CI.However,forsmallersamplesizesthisisnotthecase.Ifn=3,weneedtomultiplySEMby4toget95%CI;forn=5wemultiplyby2.5,etc.Lookatthepictureontheright.Thisissimilartowhatyousawinthesimulation,butlet'simaginea"reallife"situation.Imaginethatthegovernmenthasmanagedtomeasuretheheightofevery4yearoldboywhocurrentlylivesintheUS(thisisnotpractical,butitisnotimpossible).Theaverageheightfor4yearoldboysintheUSis40inches,sotheGovernmentknowsthetruemeanofthepopulationbecausethataveragewasderivedfrommeasuringEVERYboy.

Page 11: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

9

However,thegovernmentisnotsharingthedata,insteadPediatriciansrelayontheirownmeasurements.Sincenoonepediatriciancanmeasureallboys,theyrelayonsamplingthepopulationtodeterminewhatisanaverageheightfor4yearolds.20Largestudiesweredonealloverthecountrymeasuringtheheightof400boyseachanddeterminingmeans,SD,SEandCI.The20differentmeansandthe95%CIarerepresentedinthefigure.Thedotsarethemeans,andthelines(errorbars)arethe95%CI,therangeinwhichweexpecttofindthetruemean19/20times.Inthefigureyouwillseethatinthe18outofthe20experiments,the95%CIcapturedthetruemean,2didn't(opencircles).InotherwordsthemeanofthedatawithSEorCIbarsgivesanindicationoftheregionwhereyoucanexpectthemeanofthewholepopulation,butdespitethe95%confidenceyoumaystillendupnotcapturingthetruemeans.ErrorBars:ThetablebelowsummarizescommonerrorbarsusedinBiology.

Note:YouareNOTexpectedtoknowtheseformulas;theyaregiventoyousoyoucanunderstandwhattheerrorbarsmean.Mostproblemsyouwillencounterwillbeusing2xSEM,justbeawarethatifthesamplesizeissmall,youwillneedtovisuallyextendtheerrorbars(~3-4xSEM).Youmayseegraphswith+/-SD,wecan'tusethemtoinferaboutthepopulationmeanortocomparetwodifferentpopulationmeans,SDisadescriptivestatisticsandsoareSDerrorbars,theysimplyshowthespread,thevariationofthedatacollected.However,ifthedescriptionoftheexperimentincludesasamplesize(n),youcancalculatetheSEfromthegivenSD.Let'sreturntooursampletrichomedata.You'vecalculatedthemeanandtheSD.1)CalculatetheSEMforbothgenerationsusingtheformulaabove,andplacecalculationsintherowunderyourSD.2)Calculatethe2xSEMvalueandplaceunderyourSEMcalculations.3)Add=/-2xSEMerrorbarstoyourgraph.(Iwillnowshowyouhowtoadderrorbarstoyourgraph,stronglyencourageyoutotakenotes)Sinceoursamplesizeis100,wecanbesurethat2xSEM=95%CI.Now,thatwehaveaddeda95%Confidenceinterval,doyourconclusionsaboutthedatachange?Aremoreorlessconfidentinyouranalysis?

Page 12: A Note to Teachers - Life Sciences RLHS · Williamson and Jen Pfannerstill, Paul K. Strode, Valerie Bolster May, Pam Close, Ann Brokaw, Rayan Reardon to name just a few. Feel free

Name__________________________

DessyDimova,Ph.D.BarnegatHighSchool

10

_______________________________________________________________________________________________So,howdoweinterpreterrorbars?Whenwecomparetwodifferentpopulationsandwanttomakeastatementthattheydifferordonotdifferintheirmeans,errorbarscomeinhandy.Payattentionasyouwillbeexpectedtointerpretexperimentalresultsbasedonerrorbars.1.Themeansofthetwopopulationsaredifferentandtheirerrorbars(2xSEMor95%CI)donotoverlap,nordotheyoverlapthemean,thisisprettystrongevidencethatthereisadifferencebetweenthepopulations.2.Iftheerrorbarsoverlapjustalittlebit,butdonotoverlapthemeans,youdonothavestrongevidenceeitherway.Ifonlythistypeofdataispresented,youcansaythatthedataisinconclusive.3.Whentheerrorbarsoverlapbotheachotherandthemeans,thenmostlikethereisnodifferencebetweenthepopulations.Takehomelesson#1:Statisticsisaguide.Itdoesnotproveordisproveanything.Itguidesusintheinterpretationofdata.Tobe'certain'scientistswillcollectmultiplelinesofevidence(performmultipledifferentexperiments,testtheirpredictionsinmultipledifferentways,independentscientistwillcheckeachother,etc.)andalsotrytomakebiologicalsenseoftheirdata(linktowhatisknown).Wewilltalkmoreaboutstatisticallysignificantdifferencesandhypothesistestinglateron.InordertopracticeusingExcelspreadsheettoanalyzedata,completethetutorialslistedbelow.http://www.hhmi.org/biointeractive/spreadsheet-data-analysis-tutorials?fref=gc