quick overview of basic concepts in sta4s4cs · soup analogy! q: why is this good? an example...

Post on 01-Oct-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Quick overview of basic concepts in Sta4s4cs

(andafewbadjokes)

Mo4va4on

Sta1s1calconceptscanbesubtleanddifficulttoexplain

IthoughtitwouldbeusefultoreviewsomecentralideasinSta1s1cs

Overview

Descrip1vesta1s1cs•  Distribu1ons,sampling•  Descrip1vesta1s1csandplots•  Samplingdistribu1ons

Inferen1alsta1s1cs

•  Pointandintervales1mates•  Confidenceintervalsandthebootstrap•  Hypothesistests

Ifthereisinterest:neuraldataanalysis

Where do data come from?

Distribu(ons!

Truth!DataL

How do we get data?

Simplerandomsample:eachmemberinthepopula1onisequallylikelytobeinthesample•  Randomselec1on

Soupanalogy!

Q:Whyisthisgood?

An Example Dataset (flight delays) Cases

Variables

CategoricalVariable Quan1ta1veVariable

Cases

An Example Dataset (flight delays)

What is a good first step when analyzing data?

Whatisausefulplotforcategoricaldata?

Whatisausefulplotforquan1ta1vedata?

Box and violin plots

Joy plots

Neatfact:Somesta1s1ciansfindviolinplotsugly•  Whydotheyfindthemugly?

X-massornaments

Dynamite plots

Neatfact:Manysta1s1cianshatedynamiteplots•  Whydotheyhatethem?

What is a sta4s4c?

A:Asingledeathisatragedy;amilliondeathsisasta1s1c.

-JosephStallin

Descrip1vesta1s1csdescribethedatayouhavecollected•  Usuallydenotedwithromancharacters

•  Asinglecategoricalvariable:propor1on •  Asinglequan1ta1vevariable:themean •  Apairofquan1ta1vevariables:thecorrela1on

Example sta4s4c: correla4on coefficient

Thecorrela1oncoefficientrisasta1s1cthatmeasuresthelinearassocia1onbetweentwovariablesX,andY

risanes1mateofρ

r

Side note: correla4on ≠ causa4on

Thecorrela1oncoefficientrisasta1s1cthatmeasuresthelinearassocia1onbetweentwovariablesX,andY

Descrip1vesta(s(csareusuallydenotedwithromancharacters

•  Asinglequan1ta1vevariable:themean•  Asinglecategoricalvariable:propor1on •  Apairofquan1ta1vevariables:thecorrela1on

A variety of sta4s4cs than can be used to describe data

psocialr

Popula1on/processparametersareusuallydenotedwithgreekcharacters

•  Asinglequan1ta1vevariable:themean•  Asinglecategoricalvariable:propor1on •  Apairofquan1ta1vevariables:thecorrela1on

Sta4s4cs have corresponding parameters

πsocial

ρ

(infinitedata)(infinitedata)

Sta4s4cal inference Weusesamplesta(s(cstomakejudgementsaboutpopula(onparameters

x

μ

π,μ,σ,ρ

p,x,s,r

Sta4s4cal inference

Es4ma4on

Pointes3ma3on:

xisapointes1mateforμ

μ x

Regression Sharka`

acks

Icecreamsales

Thebi'sareusuallychosentominimizetheMSE:

Inlinearregression,wetrytofitalinetoourdata

Truth:y=β0+β1·xFit:ŷ=b0+b1·x

Regression cau4on # 1 Avoidtryingtoapplytheregressionlinetopredictvaluesfarfromthosethatwereusedtocreatetheline

Sta4s4cs have distribu4ons too!

Asamplingdistribu3onisthedistribu1onofsamplesta1s1cscomputedfordifferentsamplesofthesamesizefromthesamepopula1onSamplingdistribu1onareomenapproximatelyNormal•  Duetothecentrallimittheorem:sumsofmanyrandomvariatesleadtoaNormaldistribu1on

x1,x2,x3,…,xk

Es4ma4on

Pointes3ma3on:Intervales3mate:pointes1mate+marginoferror

xisapointes1mateforμ

Thepopula1onmeanissomewhereinhere

Confidence Intervals

Aconfidenceintervalforaparameterisanintervalcomputedbyamethodthatwillcapturetheparameteraspecifiedpropor(onof(mes•  (ifthesamplingprocesswererepeatedmany1mes)

Thesuccessrate(propor1onofallsampleswhoseintervalscontaintheparameter)isknownastheconfidencelevel

Think ring toss…

Parameterexistsintheidealworld

Wetossintervalsatit

95%ofthoseintervalscapturetheparameter

(fora95%CI)

WitsandWagers…

The bootstrap to es4mate standard errors

Theplug-inprinciple

95% Confidence Intervals

Ifthebootstrapdistribu1onis~symmetricwecanusepercen1lestocalculatea95%confidenceinterval

Formulasalsoexistforcalcula1ngCIs•  SEofthemeancanbees1matedas:•  CIis:Sta0s0c±2·SE

Ques4on: why is a p-value?

Ap-valueistheprobabilityofgeungasta1s1casormoreextremethantheobservedsta1s1cfromanullmodel(nulldistribu1on)

Hypothesis tests in 2 steps

1.Createadistribu1onshowingwhatthesta1s1cwouldlikelikeiftherewasnoeffect(H0=true)•  Nulldistribu1on(distribu1onifnoeffect)

2.Showthatthesta1s1cyouobservedisunlikelytocomefromthisnulldistribu1on•  P-valueis:Pr(S≥obs_stat|H0)•  P-valueisnot:Pr(H0|data)

Trial metaphor of hypothesis tes4ng

1.Assumeinnocence:H0istrue•  StateH0andHA

2.Gatherevidence

•  Calculatetheobservedsta1s1c3.Createadistribu1onofwhatevidencewouldlooklikeifH0istrue

•  Nulldistribu1on

4.Assesstheprobabilitythattheobservedevidencewouldcomefromthenulldistribu1on

•  p-value5.Makeajudgement

•  Assesswhethertheresultsaresta1s1callysignificant

Types of hypothesis tests test

Permuta3ontest:Createnulldistribu1onsimula1ngmanyrandomassignments•  1.Shufflethecondi1onlabels•  2.Computesta1s1conshuffledlabels•  3.Repeatmany1mestogetanulldistribu1on

Parametrictests:basedonthefactthatdistribu1onofsta1s1cisknown•  t-tests,ANOVAs,chi-squaredtests,etc...•  Lessrobust(basedonNormalapproxima1ons)

Visual hypothesis tests

“Visualhypothesistests”createvisualplotsofrandomlyrearrangedpoints•  i.e.,asetof‘innocent’plotswherethereisnopa`ern

Ifyoucantelltherealdatafromtheplotsofrandomlygenerateddatathisisequivalenttorejec1ngthenullhypothesis

Whichplotshowstherealdata?

Visual hypothesis tests

Visual hypothesis tests

SeeWickham,etal,2010

Permuta4on test example: Hypothesis tests for comparing two means

Ques3on:Isthispilleffec1ve?

Experimental design

Takeagroupofpar1cipantandrandomlyassign:•  Halftoatreatmentgroupwheretheygetthepill

•  Halfinacontrolgroupwheretheygetafakepill(placebo)•  Seeifthereismoreimprovementinthetreatmentgroupcomparedtothecontrolgroup

Par1cipantpool

ControlgroupTreatmentgroup

RandomAssignment

Hypothesis tests for differences in two group means

1.Statethenullandalterna1vehypothesis•  H0:μTreatment=μControlor μTreatment–μControl=0•  HA:μTreatment>μControlor μTreatment–μControl>0

2.Calculatesta1s1cofinterest•  xTreatment-xControl

3.Whatshouldwedonext…?

3. Create the null distribu4on!

Reconstructedpar1cipantpooldataunderH0

Shuffled‘treatmentgroup’ Shuffled‘controlgroup’

ShuffledataforrandomassignmentconsistentwithH0

ControlgroupTreatmentgroup

Onenulldistribu1onsta1s1c:xShuff_Treatment-xShuff_control

Repeat 10,000 4mes

p-value=.064

Step4?

Fisher vs. Neyman

Ques3on:shouldyoureportexactp-values?•  p<.05•  p=.021

Null-hypothesissignificancetes1ng(NHST)isahybridoftwotheories:

•  1.Significancetes1ngofRonaldFisher

•  2.Hypothesistes1ngofJezyNeymanandEgonPearson

Fisher(1890-1962) Neyman(1894-1981)

Neatfact:theyhatedeachother

Type I and Type II Errors

RejectH0 DonotrejectH0

H0istrue TypeIerror(α)(falseposi1ve)

Noerror

H0isfalse Noerror TypeIIerror(β)(falsenega1ve)

Do reproducible research!

Toolsexistthatallowyoutoembedyouranalysesintoyourwri`enreports:

•  R:RMarkdown•  Python/Julia/R:Jupyter•  MATLAB:LiveScripts

John Tukey

“Farbe`eranapproximateanswertotherightques1on,whichisomenvague,thananexactanswertothewrongques1on”

-Thefutureofdataanalysis.AnnalsofMathema1calSta1s1cs33(1),(1962),page13.

What is Data Science?

SeeDonoho,2017

Ques4ons

Interac4ve single neuron analysis tutorial…

h`ps://neuraldata.net/spike/

CreatedbyBrookeFItzgerald

Are we feeling ok about hypothesis tests?

AmericanSta1s1calAssocia1on’sStatementonp-values

top related