sample size determination in auditing accounts receivable using
TRANSCRIPT
1
SampleSizeDeterminationinAuditingAccountsReceivable
UsingAZeroInflatedPoissonModelBy
KristenPedersenAProject
SubmittedtotheFacultyOf
WorcesterPolytechnicInstituteInpartialfulfillmentoftherequirementforthe
DegreeofMasterofScienceIn
AppliedStatistics
May2010
APPROVED:Dr.BalgobinNandram,ProjectAdvisor
Dr.HuongHiggins,Co‐Advisor
2
SampleSizeDeterminationinAuditingAccountsReceivableUsingAZeroInflatedPoissonModel
ABSTRACT
Inthepracticeofauditing,asampleofaccountsischosentoverifyiftheaccountsaremateriallymisstated,asopposedtoauditingallaccounts;itwouldbetooexpensivetoauditallacounts.Thispaperseekstofindamethodforchoosingasamplesizeofaccountsthatwillgiveamoreaccurateestimatethanthecurrentmethodsforsamplesizedeterminationthatarecurrentlybeingused.AreviewofmethodstodeterminesamplesizewillbeinvestigatedunderboththefrequentistandBayesiansettings,andthenourmethodusingtheZero‐InflatedPoisson(ZIP)modelwillbeintroducedwhichexplicitlyconsiderszeroversusnon‐zeroerrors.ThismodelisfavorableduetotheexcesszerosthatarepresentinauditingdatawhichthestandardPoissonmodeldoesnotaccountfor,andthiscouldeasilybeextendedtodatasimilartoaccountingpopulations.
3
Introduction
Whenauditingacompany’saccountsreceivable,asampleofaccountsischosentoverifyif
theaccountsaremateriallymisstated.Anauditorisresponsibleforselectingasampleof
thereportedbookvalues,andmakinginferencesabouttheaccuracyofthecompany’s
recordsbasedonthesample.“Accountingpopulationsarefrequentlyveryhighlyskewed
andtheerrorrateswhichtheauditorisseekingtodetectareoftenextremelylow”(Knight,
1979).Thispaperseekstofindamethodforchoosingasamplesizeofaccountsthatwill
provideamoreaccurateestimatethanthecurrentmethodsforsamplesizedetermination
thatarecurrentlybeingused.Inthispaper,allaccountsreceivablebookvaluesconstitute
thepopulation.Bookvalueisthevaluerecordedforaccountsorfinancialstatements.A
sampleisaselectionofsome,butnotall,oftheaccounts.Theinformationgatheredfrom
thesampledaccountsisusedtomakeinferencesaboutthepopulation.Theonlywaytoget
thetotalvalueoftheaccountsistoauditallaccounts,butthiswouldbeverycostly.Not
havingtoverifytheinformationonallaccountstomaketheseinferencesreducesthecost
incalculatingthequantityofinterest.Experiencehasshownthataproperlyselected
samplefrequentlyprovidesresultsthatareasgoodastheresultsfromverifyingall
accounts,(HigginsandNandram,2009).Dollarunitacceptancesamplingtoreduceneeded
samplesizeinauditingdataisthoroughlydiscussedinRohrback(1986).Ponemonand
Wendellstudiedthebenefitsofusingstatisticalsamplingmethodsinauditingdata,as
opposedtoanauditorusinghisexpertisetochoosethesample(PonemonandWendell,
1995).Althoughcostisnotspecificallyaddressedinthispaper,itprovidesmotivationfor
investigatingwhatsamplesizeisneededtogetefficientresults.
TheZero‐InflatedPoissonmodelwillbeintroducedinsection4,itisamodelto
accommodatecountdatawithexcesszeros.Ifacompanykeepsaccurateaccounts
receivable,thentherewouldbenoerrors,thismeansthattherewillbemorezerosinthe
datathanwouldbeaccountedforunderthestandardPoissonmodel.Therefore,wehope
getabetterestimateoftheappropriatesamplesize.In“MonetaryUnitSampling:
Improvingestimationofthetotalauditerror”(HigginsandNandram,2009),theZIP
methodisdicussedanditisshownthatforaccountingdataandothersimilardata
4
populations,thataboundundertheZIPmodelisreliableandmoreefficientthancommon
MonetaryUnitSamplingpractice.
RelatedResearch
Kaplan(1973)conductedsimulationstudiesbasedonhypotheticalpopulationsanderror
patternstoobservethebehaviorofratioanddifferenceestimatorswhenthepopulation
containsalimitednumberofzeros.Kaplanfoundastrongcorrelationbetweenthepoint
estimateandtheestimatedstandarderror,andshowedthatthenominalconfidencelevel
impliedbythenormaldistributionforlarge‐sampleconfidenceintervalswasfrequentlyfar
differentfromtheproportionofcorrectconfidenceintervals.
BakerandCopeland(1979)evaluatedtheuseofstratifiedregressionincomparisonto
standardregressioninaccountauditingdata.Aminimumof20errorsforthedifferenceof
bookvalueandauditvalueasanestimatorwasfoundtogivesuperiorresultsforthe
stratifiedregression.Aminimumof15errorsforaratioestimatorisneededforsuperior
resultsinstratifiedregression.Theusefulnessofthisinformationisquestionabledueto
thelowerrorrateofaccountingpopulations.
SahuandSmith(2006)exploredafullBayesianframeworkintheauditingcontext.They
foundthatnon‐informativepriordistributionsleadtoverysmallsamplesizes.Specifically,
ifthemeanofthepriordistributionisfarfromtheboundaryvalue(ortheperitem
materialerror),thenthesamplesizerequiredisverysmall.Inthiscase,thesamplesize
couldbesetbytheauditor.Whenthepriormeanisclosetothematerialerroralarge
samplesizeisrequired.
Berg(2006)proposedaBayesiantechniqueforauditingpropertyvalueappraisals.ABeta‐
Binomialmodelwasimplemented,andtheirprocedurerequiredsmallersamplesizes
relativetothosebasedonclassicalsamplesizedeterminationformulas.
5
Data
ThedatacomesfromLohr(1999)wheretherecordedvaluesb1,b2,b3,……bnforasampleof
=20accountsforacompanywerelisted,alongwithalltheaudited(actual)valuesa1,a2,
a3…..anofthesample.ThecompanyhadatotalofN=87accountsreceivable.Thetotalbook
valuefortheN=87accountsreceivableofthecompanywas$612,824.Thetotalbookvalue
ofallaccountsreceivableforacompanywouldbeB=b1+b2+…….+bN.Thetotalaudit
valueforacompany’saccountsreceivablewouldbeA=a1+a2+……..+aN.We’lldefinethe
errortobethedifferenceofthebookvalueandthetrueauditvalueforeachaccounti
(i=1,2,3,…N)as .Thismeansthatthetotalamountoferrorforthe
accountsis .Intheaccountingcontext,weexpectalargenumberof
accountstohave .Let betheerrorrateperdollar.Thereforetheerrorofeach
accountwillbe fori=1,…, .,andtheerrorrateperdollarforthesamplewillbe
willbe .Ourinitialestimatefor obtainedfromLohr’sdatais with
standarddeviation theseestimatesarethemeanandvariancecalculatedfromthe
sampleintheLohrtext.Arandomsampleof20accountswithreplacementwastakenfrom
thepopulationof87accounts.Thebookvalue,theauditvalue,andthedifferencebetween
thebookvalueandauditvaluefor16oftheaccountsarelistedinthetablebelow.The
sampleofsize20intheLohrtextwaswithreplacement,therewere4accountsthatwere
repeatedinthesampleintheLohrtext,andthese4accountswereremovedforthe
purposeofthispaper.
6
Table1:
Account Book Value Audit Value BV-AV
3 6842 6842 0
9 16350 16350 0
13 3935 3935 0
24 7090 7050 40
29 5533 5533 0
34 2163 2163 0
36 2399 2149 250
43 8941 8941 0
44 3716 3716 0
45 8663 8663 0
46 69540 69000 540
49 6881 6881 0
55 70100 70100 0
56 6467 6467 0
61 21000 21000 0
70 3847 3847 0
74 2422 2422 0
75 2291 2191 100
79 4667 4667 0
81 31257 31257 0
InitialExploration
Wefirstlookedatthesamplesizerequiredfor inahypothesistestwhilecontrollingfora
significancelevelof.05( )andpowerequalto.95( ).WechosebothPoisson
andBinomialmodelsforourinitialexploration.Thepoissondistributionisappliedin
countingthenumberofrareevents,whichinthecontextofauditingdatawearemodeling
theoccurrenceofthebookvaluenotbeingequaltotheauditvalue.Areasonablemodelfor
7
initialexplorationis , =1,…,N,whichimpliesthatthemeanofthe
differenceofthebookvalueandauditvalueisthebookvaluemultipliedbytheerrorrate.
TheBinomialmodelwasalsochosenforinitialexplorationbecausethebinomial
distributioniseasilyapproximatedbythenormaldistribution,andherethethoughtwould
bethatthebookvalue isthenumberofdollarsinaparticularaccountandeachdollar
hasprobability ofbeingmateriallymisstated.Thismodelwouldbeexpressed
by =1,…,N.Thebinomialdistributiondoesnotallowforthelarge
numberof thatwehaveinoursample,butasinitialexploration,theresultsunder
thebinomialmodelcanbecomparedwiththeresultsfromthepoissonmodelusingsimilar
methodstomakecomparisonsandhelparguethatourresultsarereasonable.(Sahuand
Smith,2006)investigatetheuseofthenormaldistributionwheretheassumptionsof
normalityarenotappropriate.Athoroughdiscussionofconfidenceintervalcriteriais
givenin(Jiroutek,Muller,KupperandStewart,2003).Usingdecisiontheorytoselectand
appropriatesamplesizeiscoveredintwopapersbyMenzefricke(1983and1984).In
sections1and2ofthispaperweusethepoissonandbinomialmodelsrespectivelyfora
frequentistapproximation(section1)andaBayesianmethodofapproximation(section2).
Section3brieflydiscussestheintervalfor fromtheposteriordistributionofthepoisson
model.AdiscussionofBayesianmodelperformancecriteriaisgivenin(WangandGelfand,
2002).Inourpaper,thisinitialexplorationismovingtowardstheintroductionoftheZero‐
InflatedPoissonmodelthatwillbediscussedinsection4.
1.FrequentistApproximation
Firstexplorationinvolveslookingatthesamplesizerequiredforconfidenceintervals
underboththePoissonmodelandtheBinomialmodel.Here,wechosetocontrolforthe
lengthoftheinterval,weallowedL(length)tobeintheinterval(.001,.02).Thisinvolved
usingestimatesfromthedataset.Aninitialestimateforthemeanerrorperdollar ,
called wascalculatedfromthesampletobe ,withastandard
deviation .Theformulasusedtocalculatethemeanandthevariancewere
8
and ,CasellaandBerger(2002).Theresultsfor
thesamplesize thatresultfromthisinitialexplorationundertherespectivemodelsin
sections1.1and1.2below.
1.1FrequentistApproximationfornunderthePoissonModel:
Here, =1,…, .
Equation(1)computesa( )%confidenceintervalfor .Here,
,(1)
where isusedtorepresentthez‐scoreforthe percentileofthenormal
distribution.However,wearenotlookingtocomputeaconfidenceintervalfor ,but
ratherwearelookingtohaveanintervalfortheappropriatesamplesize .Thedistance
ofthetwoendpointsofthisinterval, to isthelength.
Sinceourintervaliscenteredaround ,wecanuse2multipliedbythedistanceof to
tocalculatethelength.
Thisimpliesthatthelengthoftheintervalresultingunderthismodelcanbedescribedby
thefunction
.(2)
9
Thealgebratodemonstratetheintermediatestepscanbefoundinsection1.1ofthe
appendix.
Wealsoknowthattheexpectationof and basedonthePoissondistributionwillbe
and .Fillingthesevaluesoftheexpectationinfor and
weareabletoaccountfornothavingthisinformation.Aftersubstitutingthese
expectationsintoourequationfor ,andsolvingfor ,theresultis
.(3)
InFigure1below,theresultsforthesamplesizeversuslengthareillustrated.
Figure1:
FrequentistApproximationofSSUnderThePoissonModel
n n
L
10
Figure1demonstratesthatforanintervallengthlessthan0.007thesamplesizeisincreasingas
thelengthbecomessmaller.ALengthoftheintervalgreaterthan0.007wouldrequireasample
sizeof1.
1.2FrequentistApproximationfornundertheBinomialModel:
Usingthesamemethodasabove,thesamplesizeiscalculatedundertheBinomialmodel.
Weseektohaveanestimateforthesamplesize thatissimilartotheestimatewe
calculatedunderthePoissonmodel.Here,
=1,…, .
UndertheBinomialmodel,equation(4)showstheintervalfor ,
.(4)
ThisimpliesthatthelengthoftheintervalundertheBinomialmodelisexpressedasin
equation(5)is
.(5)
Intermediatestepsarefoundinsection1.2oftheAppendix.Wealsoknowthat
and ,thenwesubstitutetheseexpectations
intoourequationfor and .Asbefore,intermediatealgebraicstepscanbefoundin
11
section1.2oftheAppendix.Aftersolvingfor ,wegetthefollowingequationfor ,where
isrepresentedby
,(6)
and,
.(7)
Ourworkwasverifiedbyusingasimilar,simplermethodforthebinomialmodelabove,
becausethealgebratoarriveatthisresultwasextensive.Section1.3isthegeneraloutline
ofanotherapproachunderthebinomialmodelandthismethodalsoresultedinasimilar
estimatedsamplesize.
1.3FrequentistApproximationtothesimplerBinomialmodel:
If , =1,…, ,thisimpliesthatthesumofthe willbedistributedas
.
Wehave, .
Aconfidenceintervalisfirstsetupfor .However,tosimplifycalculationswechoosetolet
.Therepresentationofthisisgivenas,
,(8)
where ,then
12
.(9)
Calculationssimilartothoseoftheintervalsfor resultintheformulaforlengthgivenin
equation(10),
.(10)
Sinceweknowthat wecancontinuesimilarlytothe
previouspoissonandbinomialintervals,bysubstitutingintheexpectationforX.The
resultsaresimilartothepreviousestimationofsamplesizeinthemodelsabove.Also
includedisFigure3thatdisplaystheratioofestimatedsamplesizeversusthelengthofthe
interval.
Figure2:
FrequentistApproximationofSSUnderTheBinomialModel
n
L
13
Figure2hasvaluesofsamplesizethatareslightlygreaterthanthevaluesofsamplesizeinFigure
1foragivenlength.Howeverthesevaluesaresimilar,andthisismoreeasilyexpressedinthe
ratioplotFigure3.Itwouldbeexpectedthatthevaluesofsamplesizeforcorrespondinglength
wouldbesomewhatsimilarunderthepoissonandbinomialmodels.
Figure3:
ItcanbenoticedbycomparisonofFigure1andFigure2,thattheestimatedlengthofthe
PoissonintervalissmallerthantheBinomialintervalforanysamplesize,untilalengthof
.013.Theratioofsamplesize,displayedatFigure3ofthetwomodelsisabout1aftera
lengthof.013,whichmeansthatatalengthof.013therequiredsamplesizeisthesame.
2.BayesianMethodsforSampleSizeDetermination
Next,wewilltakealookatBayesianmethodsofapproximation.LetY(n)=(X1,X2,…..Xn)be
arandomsampleofsizen,wewilllookatthehighestposteriordensityregionundertwo
models.Inpart2.1assumingapopulationwithdensity , =1,…, ,
andpriordistribution ofunknown ,wewillcalculatethePosterior
RatioofSampleSizevs.Length
L
14
distributionfor andthencontinuetofindanestimateforthesamplesize usinga
normalapproximationtotheposteriordistribution.McCray(1984)proposedaBayesian
modelforevaluatingdollarunitsamples,evenifaninformativepriorprobability
distributionontheexpectedtotalerrorisnotavailable.Insection2.2themodelassumeda
populationdensity =1,…, ,withprior .Underthe
binomialmodelwewillproceedbyagainfindingtheposteriordistributionfor andthen
usinganormalapproximationtofindanintervalfor .
2.1CalculatingthePosteriorDistributionfor underthePoissonModel:
InBayesianstatistics,theposteriordistributionisequaltothepriordistributiontimesthe
likelihood.WehavealreadystatedthatthelikelihoodunderthePoissonmodelis
, =1,…, ,andweareusingprior .Thisimpliesthat
theposteriordistributionfor willthereforebe .Giventhat
wehaveanestimatefor , fromourdata,withstandarddeviationestimate
,wecanusethisinformationtosolveforourinitialestimatesof and .
Sincethepriorfollowsagammadistribution,wecansetupequationsforthemeanand
varianceas ,and .Aftersolvingfor and , ,
and .Intermediatestepsarefoundinsection2.1oftheAppendix.Afterfilling
thesevaluesfor and ,theposteriordistributionwillbedistributedas
.However,wewillstillneedinformationfor ,and
isunknown.
2.2Solvingforn,usingtheNormalApproximationforthePosteriorof underthe
PoissonModel:
15
Underthepoissonmodel,theposteriordistributionfor givenyisapproximatelynormal
withmean andvariance .Themeanandvariancewere
easilyderivedbyusingtheformulaforthemeanandvarianceofaGammadistribution.
Thus,
, =1,…, .(11)
Wewanttochooseasamplesizefor in sothatwehaveatleast
100( )%confidenceforthetruedifferencebetweenthebookvalueandtheaudited
value,givenaspecifiedlengthandtotalbookvalue.Theintervalfor isfrom
to ,becauseListhetotallengthoftheinterval,and
thenormaldistributionissymmetric.Thefollowingequationisforfindingthesmallest
areaunderourmodelthatisatleast100( )%confidentfor .Theintegration
involvesaveragingover sothatwearenolongerdealingwiththeposterior
distribution ,butratherafunctionof .Theformularepresentationforaveragingover
isgiveninequation(12),
.(12)
Insection3,theappropriatesamplesize iscalculatedfortheposteriordistributionof giventheinformationfory.However,inthispartofthepaperweareaveragingovery,so
thatthisintervaliscalculatedfor usingsimilarinformationtotheintervalgivenin
section1.Theposteriordistributionof isapproximatelynormal,sowecanmakeuseof
thepropertiesofthenormaldistributiontosimplifythecalculations.Thus,
16
.(13)
Herethemeancancelsinthenumerator,whichleadstothemuchmoresimplifiedlooking
equation(14),
.(14)
MonteCarlointegrationwasusedtogettheestimatefor ,usingthefactthat
, =1,…, ,samplesizesbetween100and500wereused,andwitheach
samplesize ,10,000simulationsweredrawn.Theoptimalvaluefor wasfoundtobe
=16.Similarresultswerefoundundertheposteriordistributionof undertheBinomial
model,andarecalculatedandcomparedinsection2.4.
2.3CalculatingthePosteriorDistributionfor undertheBinomialModel:
Forthe , =1,…, ,modelwithprior ,theposterior
distributionisagainthepriortimesthelikelihoodandresultsina
distribution.Again,theposteriordistributiondependson
parametersthatwecanestimate.Tosolvefortheparametersofthebetapriorwewilluse
theknownequationsforthemeanandvariance,alongwithourinitialestimatesfor and
.Themeanofthebetaprioris andthevarianceofthebeta
distributionisgivenby .Solvingthissystemofequations
17
theresultingparameterestimatesare and .Detailedcalculationof
meanandvariancearegiveninsection2.3oftheAppendix.Afterfillingintheseestimates,
theresultingposteriordistributionis .Dueto
theunknowninformationfor ,lookingatthenormalapproximationtothebetaposterior
isagainofinterest.
2.4Solvingforn,usingtheNormalApproximationforthePosteriorof underthe
BinomialModel:
Underthebinomialmodel,theposteriordistributionfor givenyisapproximatelynormal
withmean andvariance .The
meanandvariancewereeasilyderivedbyusingtheformulaforthemeanandvarianceofa
betadistribution.Thus,
,=1,…, .
(15)
Again,proceedingaswedidinsection2.1above,wewanttochooseasamplesizefor in
sothatwehaveatleast100( )%confidenceforthetruedifferencebetweenthe
bookvalueandtheauditedvalue,givenaspecifiedlengthandtotalbookvalue.The
intervalfor isfrom to .Thus,
,=1,…, .
(16)
18
Again,weareproceedingwiththesamemethodsasusedaboveforthegammaposterior
distribution,byusingknownpropertiesofthenormaldistribution.Here,
,
(17)
canbewrittenas,
.(18)
Herethemeancancelsinthenumeratorjustaspreviouslyseeninthegammaposterior
distribution.Thiscancelationleadstothemuchmoresimplifiedlookingequation(19),
.
(19)
UsingMonteCarlointegrationtogettheestimatefor ,usingthefactthat
, =1,…, ,samplesizesbetween200and600wereused,and
witheachsamplesize ,10,000simulationsweredrawn.Theoptimalvaluefor was
alsofoundtobe =16,thiswasthesamesamplesizethatwasfoundtobeoptimalunder
thepoissonmodel.
Figures4and5belowareplotsofsamplesizevs.varying confidencelevels.Thiswas
foundusingtheformulaforthenormalapproximation ,
underthetwodifferentmodelsrespectively.BothFigures4and5areforfixedL,L=0.001
and , =7043.
19
Figure4:
Figures4and5demonstratethatthe95%confidencelevelisachievedforaninteger
samplesizeof16underboththenormalapproximationtothepoissonmodelandthe
binomialmodelwiththelengthheldataconstant.001andtheaveragebookvalue,
=7043.
Figure5:
SSDForNormalApproximationUnderThePoissonModelPosterior
SSDforNormalApproximationofSSUnderTheBinomialModelPosterior
n
n
20
Figures6and7demonstrateasequenceofvaluesforlengthvaryingfrom.001to.022,
wherethesamplesizeisthesmallestintegervaluethatsolvestheintegralatgreaterthan
.Soalthoughthe valuesforeachsamplesizeversuslengtharenotthesame,the
integervaluethatsolvesfortheequationisthesameinbothFigures6and7.Thisalso
meansthattheratioofsamplesizebetweenthetwodifferentmodelswillbe1forallvalues
oflength.
Figure6:
Figures6and7havethesameshapeasFigures1and2.Thisimpliesthatthesamplesize
vs.lengthunderthefrequentistmethodusedandtheBayesianmethodareproducing
similarresults.Figure7followsonthenextpage,howeveritiseasilynoticedthatFigures
6and7arequitesimilar,andthisisduetointegervaluesforsamplesizebeingusedas
input.
NormalApproximationUnderPoissonModelvs.Length
n
L
21
Figure7:
AsthelengthgetssmallerthanL=0.002inbothFigures6and7,thesamplesizerapidly
increases.
Section3:
BayesianEstimatesforthePosteriorDistributionof
Simplyusingtheposteriordistributionfromthenormalapproximationtothepoissonmodelin
section2.1,weareabletosetuptherelationshipbelow,
.
Theestimates.0996and14.29areestimatesfor and respectivelyfromthepriordistribution
.Theestimatesfor and arecalculatedinsection2.1oftheAppendix.Simple
algebraicmanipulationoftheaboveformulaleadstothefollowinginequalityfor ,
NormalApproximationUnderBinomialModelvs.Length
n
L
22
.
InTableAbelow,ifintheaboveinequalityfor we
allowthevaluefor tovary,whilesetting =4226asaninitialestimate,holding
asfixed,andL=.001asfixedthecorrespondingvaluesforsamplesizearegiven.
InTableBbelow,similarlyifweusetheinequality and
hold constantat1.96andallowlengthtovary,thecorrespondingvaluesforsample
sizearegiven.Foravalueof slightlygreaterthan1.96,thechoiceofconstantvaluein
sections1and2,wehaveacorrespondingsamplesizeof36.55.Thisvalueislargerthan
thenominalsamplesizeunderthefrequentistmethodandpreviousBayesianmethod
wherethevaluesof wereaveragedover.Also,itisnoticedthatasthelengthvaries
(TableB)thevaluescloselymimicthevaluesdepictedbyFigures1and6,thesefigures
correspondtothefrequentistpoissonapproximationandtheBayesianpoisson
approximationrespectively.
23
TableA: TableB:
Sample Size Length
Sample Size
1.28 23.63 0.001 36.18 1.38 25.47 0.002 18.09 1.48 27.32 0.003 12.06 1.58 29.16 0.004 9.04 1.68 31.01 0.005 7.23 1.78 32.86 0.006 6.03 1.88 34.7 0.007 5.17 1.98 36.55 0.008 4.52 2.08 38.4 0.009 4.02 2.18 40.24 0.01 3.62 2.28 42.09 0.011 3.29 2.38 43.93 0.012 3.01 2.48 45.78 0.013 2.78 2.58 47.63 0.014 2.58 2.68 49.47 0.015 2.41 2.78 51.32 0.016 2.26 2.88 53.16 0.017 2.12 2.98 55.01 0.018 2.01 3.08 56.86 0.019 1.9
OnthefollowingpagearetheFigures8and9forvaryinglengthvs.samplesizeforthe
posteriordistributionof underthetworespectivemodels.
24
Figure8:
Figures8and9alsohavethesameshapeasFigures1and2.Figure8(theBayesian
poissonmethod)hasslightlylargervaluesthanFigure1(thefrequentistpoissonmethod),
however,Figure8usesinformationforythatwedonothaveavailabletousebeforethe
sampleischosen.ThisissimilarlythecaseifwecompareFigure9(theBayesianbinomial
method)toFigure2(thefrequentistbinomialmethod).
Figure9:
NormalApproximationtoPosteriorDistributionof inThePoissonModel
NormalApproximationtoPosteriorDistributionof inTheBinomialModel
nn
L
L
25
4.ZeroInflatedPoissonModel
Zero‐InflatedPoisson(ZIP)isamodeltoaccommodatedatawithexcesszeros.Itassumes
withprobability theonlypossibleobservationis0(oratruezero),andwithprobability
a =1,…, ,randomvariableisobserved.Ifacompanykeepsaccurate
accountsreceivable,therewillbenoerrors,andthisimpliesthattherewillbemanyzeros.
AlthoughthePoissondistributionincludeszero,therearemorezerosinthedatathan
appropriateforusingthestandardPoissonmodel.TheZIPmodelwillproduceamore
preciseestimate.AthoroughexplanationoftheZIPmodel,fittingZIPregressionmodels
andthesimulatedbehavioroftheirpropertiesinmanufacturingdefectdataisgivenin
(Lambert,1992).
IntheZIPmodel,theresponsesZiareindependentand=1,…, .
Ourmodelfor underthezeroinflationisgivenby .Belowisatable
forthejointdistributionforZiand .Thejointdistributiondefinestheprobabilityof
eventsintermsofbothZiand .
0
ThelikelihoodfunctionundertheZero‐InflatedPoissonmodelisthereforegivenby
equation(20),
,=1,…, .
(20)
26
Underthismodel, and arenuisanceparametersandneedtobeestimated.Estimatesfor and
,namely and canbeestimatedby where isthecovariancematrix
of and ,andiscalculatedbasedonthesecondpartialderivativeswithrespectto and .The
secondpartialderivativesaregivenasequations(21),(22)and(23).Detailedstepswhichledto
theseresultsaregiveninsection4oftheAppendix.Thesecondpartialderivativesaregivenby,
,
(21)
,
(22)
and
.
(23)
ThesearetheelementsoftheHessianmatrix,andthesewillbeusedtosolveforthe
covariancematrix and bytakingtheoppositeoftheHessianmatrixandthensolving
fortheinverse.Oncewehavedeterminedthecovariancematrix,onlythe(1,1)elementof
thematrixisofinterest,becauseitisthevarianceoftheta.Detailedstepscanbefoundin
section4oftheAppendix.Theestimateof isapproximatelynormal,andisgivenby,
,
27
where .
(24)
Weneedtogetestimates and forthenuisanceparameters and beforewecan
continue.Numericalmethodswereusedduetothecomplexityoftheproblem.TheEM‐
algorithmyieldedestimatesof =.78882and =.01113,andtheNelder‐Meadmethod
yieldedestimatesof =.79991and =.01143.Thiswascalculatedbyfirstfinding
estimatesforthe ’s.Findingestimatesforthe ’swasnecessarybecause isan
indicatorvariablethereforethesequencewouldnotconvergewellifitwerenotfirst
estimatedbytheequations,
and .
The ’sfollowaBernoullidistribution ,=1,…, .
Theexpectationforthe ’saregivenbytheequations,
and .
Theexpectationfor canthenbepluggedintotheequationsfor and ,andthenthe
numericalmethodswereperformedandresultedintheestimatespreviouslystated.After
obtainingestimatesforthenuisanceparameters,theEMmethodispreferredtotheNelder‐
MeadmethodsotheEMestimateswereusedincalculatingthevariance. isdistributed
as
28
.
Wecallthisvariance“a”anduseourestimates , , ,
tosolvefor ,weusethisfactorin =1.687012toinvestigatethe
samplesizegivenintheZIPmodelincomparisontothesamplesizeunderthestandard
poissonmodel.Thiswaywewillhaveresultssimilartothosethatweregivenunderthe
frequentistpoissonmodel,buttheywillbescaledbyafactor tomakeforeasein
comparisonofthetwomodels.Supplementarycalculationscanbefoundinsection4ofthe
Appendix.Thecalculationsforthesamplesize undertheZIPmodelfollowsimilarlyto
thefrequentistmethodusedinsection1,butbeginningwith
.
(25)
Multiplyinganddividingthroughby
sothatwewillhaveasamplesizethatis
easilycomparedtothePoissonfrequentistmodelresultsin
.
(26)
Anintervalfor
willbe
.
(27)
Aftersomealgebraicmanipulation,withappropriateintermediatestepsfoundinsection4oftheAppendix,thelengthoftheintervalundertheZIPmodelis
29
.
(28)
Afterrearrangingthelengthequationtosolvefor ,weareleftwithequation(29),
.
(29)
Thesupportingalgebraisfoundinsection4oftheAppendix.Itisnoticedthattheinterval
for undertheZero‐InflatedPoissonmodelisverysimilartotheintervalforunderthe
frequentistPoissonmodel.Theonlydifferencebetweenthetwoequationsforisafactor
a*.Figure10belowdisplaystheZero‐InflatedPoissonmodelcomparingthesamplesize
vs.length.
Figure10:
SSDUnderTheZero‐InflatedPoissonModel
L
n
30
ThegraphfortheZIPmodel(Figure10)hasaslightlyhighersamplesizeforanygiven
lengththanunderthepoissonfrequentistmodel(Figure1).UndertheZIPmodelwe
expecttheretobeincreasedprecision,however,thisincreasedprecisionrequiresalarger
samplesizetoattain.
Conclusion:WeproposedamethodusingtheZero‐InflatedPoisson(ZIP)modelwhichexplicitlyconsiders
zeroversusnon‐zeroerrors.Thismodelisfavorableduetotheexcesszerosthatarepresentin
auditingdatathatthestandardpoissonmodeldoesnotaccountfor.Thismethodcouldeasilybe
extendedfordatasimilartoaccountingpopulations.Theestimatednecessarysamplesizewas
largerundertheZIPmodelthanunderthestandardpoissonmodel.However,duetotheexcess
zerosinthedataset,itwouldbereasonabletoassumethatthelargersamplesizeisnecessaryfor
increasedprecision.Furtherresearchandinvestigationisneededtoexaminemorepreciselythe
benefitsundertheZIPmodel.Belowisatablesummarizingtheresultsofthemethodsfromthe
foursectionsofthispaperwhenthelengthisL=0.001,
€
α isfixedat , ,and
€
b = 7043.
Poisson Binomial ZIP Frequentist 15.29 22.72 41.02 Bayesian 16 16 Bayesian 36.18
Theratioofthepoissonfrequentistsamplesizetothebinomialfrequentistsamplesizeis0.67.
Figure3insection1ofthepaperdepictedthefrequentistbinomialsamplesizebeinglarger,until
thelengthoftheintervalreached0.013.Theposteriordistributionof forthepoissonmodel
hadanominalsamplesizeof
€
n = 36.18,thiswasthevalueclosesttoourZIPsamplesizeestimate
of
€
n = 41.02.However,asstatedinsection3ofthispaper,informationforywasusedthatwould
notbeavailablebeforedecidingthenumberofaccountstoselectforthesample.Theestimates
forthesamplesizeundertheBayesianmethodwhereMonteCarlointegrationwasimplemented
resultedinasamplesizeof16underboththepoissonandbinomialmodels.Thiswasbecausewe
31
chosethefirstintegersamplesizewithalengthequalto0.001tohavegreaterthan.95confidence
asouroptimalsamplesize.
32
REFERENCES
Baker,RobertL.,Copeland,RonaldM.,(1979).EvaluationoftheStratifiedRegressionEstimatorforAuditingAccountingPopulations.JournalofAccountingResearch,17:606‐617.Berg,Nathan.(2006).ASimpleBayesianProcedureforSampleSizeDeterminationinanAuditofPropertyValueAppraisals.RealEstateEconomics,34:133‐155.Casella,G.,Berger,R.L.,(2002).StatisticalInference2ndEd.DuxburyPress.Cockburn,I.M.,Puterman,M.L.,andWang,P.(1998).AnalysisofPatentData–AMixed‐Poisson‐Regression‐ModelApproach.JournalofBusiness&EconomicStatistics,16:27‐41Higgins,H.N.,Nandram,B.,(2009).MonetaryUnitSampling:ImprovingEstimationoftheTotalAuditError.AdvancesinAccounting,IncorporatingAdvancesinInternationalAccounting,25,2:174‐182Jiroutek,M.R.,Muller,K.E.,Kupper,LawrenceL.,Stewart,P.W.,(2003).ANewMethodforChoosingSampleSizeforConfidenceInterval‐BasedInferences.Biometrics,59:580‐590.Kaplan,RobertS.,(1973).StatisticalSamplinginAuditingwithAuxiliaryInformationEstimators.JournalofAccountingResearch,11:238‐258.Knight,P.,(1979).StatisticalSamplinginAuditing:AnAuditor’sViewpoint.JournaloftheRoyalStatisticalSociety,28:253‐266.Lambert,D.,(1992).Zero‐InflatedPoissonRegression,WithanApplicationtoDefectsinManufacturing.Technometrics,34:1‐14Lohr,S.L.,(1999).Sampling:DesignandAnalysis.NewYork:DuxburyPress.McCray,JohnH.,(1984).AQuasi‐BayesianAuditRiskModelforDollarUnitSampling.TheAccountingReview,LIX,No.1:35‐41.Menzefricke,U.,(1983).OnSamplingPlanSelectionwithDollar‐UnitSampling.JournalofAccountingResearch,21:96‐105.Menzefricke,U.,(1984).UsingDecisionTheoryforPlanningAuditSampleSizewithDollarUnitSampling.JournalofAccountingResearch,22:570‐587.Ponemon,LawrenceA.,Wendell,JohnP.,(1995).JudgementalVersusRandomSamplinginAuditing:AnExperimentalInvestigation.Auditing:AJournalofPracticeandTheory,14:17‐34.
33
Rohrback,KermitJ.,(1986).MonetaryUnitAcceptanceSampling.JournalofAccountingResearch.24:127‐150.Sahu,S.K.,Smith,T.M.F.(2006).ABayesianMethodofSampleSizeDeterminationwithPracticalApplications.JournaloftheRoyalStatisticalSociety,169:235‐253.Wang,F.,andGelfand,A.E.,(2002).ASimulation‐BasedApproachtoBayesianSampleSizeDeterminationforPerformanceUnderaGivenModelandforSepartingModels.StatisticalScience,17:193‐208
34
APPENDIX
Section1:1.1FrequentistApproximationtothePoissonmodel:
,i=1,…, .
(1)
Thelengthoftheintervalforthetawillbe:
(2)
isestimatedby
Aftersubstitutinginfor ,wecansolveforn:
35
(3)
1.2FrequentistApproximationtotheBinomialmodel:
,i=1,…, .
(4)
Thelengthoftheintervalforthetawillbe:
(5)
Andtheexpectationof Thisimpliesafterfillinginfor and that:
36
Aftermultiplyingthroughtopandbottombyn2theresultingequationis:
Simplyrearrangingtheequationandtakingthesquarerootofbothsidesresultsin:
Wewillcallthesquarerootterm“a”
(6)
Andwecontinuesolvingtheequationforn:
37
(7)
Andproceedusingasimpleiterativealgorithmtogetvaluesfornforvaryinglengths.
1.3FrequentistApproximationtothesimplerBinomialmodel:
(8)
Let
(9)
Thelengthoftheintervalforthetawillbe:
38
Takeout2n*Zfromundertheradical,then*inthenumeratorthencancelswiththen*in
thedenominator.
(10)
Nowtouse asanestimateforX,weknowthat
Andaftersubstitutinginwewouldhave
However,abetterapproximationwouldinvolve:
Andaftersubstitutionweareleftwith
Let ,andaddandsubtract1fromtheleft‐handsideoftheequation:
39
Thensquarebothsides:
Thisimplies:
Section2:2.1BayesianNormalapproximation,underthePoissonmodeltotheGammaPosteriorDistribution:
,i=1,…, ,whichcanbewrittenas i=1,…, .InBayesianstatistics,thisiscalledthelikelihood.
40
Weassumepriordistribution ThePosteriordistribution isproportionaltothePrior*Likelihood
Solvingfor and giventhat and :
nowsubstituteintosolvefor :
2.2Solvingforn,usingtheNormalApproximationforthePosteriorof underthe
PoissonModel:
41
(11)
(12)
(13)
(14)
2.3BayesianNormalapproximation,undertheBinomialmodeltotheBetaPosteriorDistribution:
,i=1,…, .
ThePosteriordistributionagainisgivenby:
Solvingfor and giventhat and
42
Simplysolvingthesystemofequationsresultsin =.09271and =13.1514.
2.4Solvingforn,usingtheNormalApproximationforthePosteriorof underthe
BinomialModel:
(15)
43
(16)
(17)
(18)
(19)
Section4:ZeroInflatedPoissonmodel:FirstSolvingforthesecondpartialderivativesofthelikelihoodfunction:
Thelikelihoodfunctionis
(20)
44
(21)
Let thenbythequotientrule:
(22)
Let thenbythequotientrule:
45
(23)
TheHessianMatrixwillthereforebe:
WefirsttakethenegativeoftheHessianMatrix:
Next,toobtaintheinverseofthismatrix:
46
The(1,1)elementofthismatrixwillbe:
(24)Wecanuse ,aftersubstitutingthisintotheequation:
Although ,itisapproximately,sowewilluse tosubstituteintoourequations.
Usingcommondenominator
47
Wecallthis“a”anduseourestimates , , , tosolve
for ,weusethisfactorin =1.687012toinvestigatethesamplesize
givenintheZIPmodelincomparisontothesamplesizeunderthestandardPoissonmodel.
(25)
(26)
(27)
Theformulaforthelengthwillbe:
(28)
48
Theintervalfornwillbe:
(31)
You’llnoticethatthisisverysimilartotheintervalforninthefrequentistPoissoncase,exceptisitscaledbyafactorof“a”.