sample size determination in auditing accounts receivable using

1

SampleSizeDeterminationinAuditingAccountsReceivable

UsingAZeroInflatedPoissonModelBy

KristenPedersenAProject

SubmittedtotheFacultyOf

WorcesterPolytechnicInstituteInpartialfulfillmentoftherequirementforthe

DegreeofMasterofScienceIn

AppliedStatistics

May2010

APPROVED:Dr.BalgobinNandram,ProjectAdvisor

Dr.HuongHiggins,Co‐Advisor

2

SampleSizeDeterminationinAuditingAccountsReceivableUsingAZeroInflatedPoissonModel

ABSTRACT

Inthepracticeofauditing,asampleofaccountsischosentoverifyiftheaccountsaremateriallymisstated,asopposedtoauditingallaccounts;itwouldbetooexpensivetoauditallacounts.Thispaperseekstofindamethodforchoosingasamplesizeofaccountsthatwillgiveamoreaccurateestimatethanthecurrentmethodsforsamplesizedeterminationthatarecurrentlybeingused.AreviewofmethodstodeterminesamplesizewillbeinvestigatedunderboththefrequentistandBayesiansettings,andthenourmethodusingtheZero‐InflatedPoisson(ZIP)modelwillbeintroducedwhichexplicitlyconsiderszeroversusnon‐zeroerrors.ThismodelisfavorableduetotheexcesszerosthatarepresentinauditingdatawhichthestandardPoissonmodeldoesnotaccountfor,andthiscouldeasilybeextendedtodatasimilartoaccountingpopulations.

3

Introduction

Whenauditingacompany’saccountsreceivable,asampleofaccountsischosentoverifyif

theaccountsaremateriallymisstated.Anauditorisresponsibleforselectingasampleof

thereportedbookvalues,andmakinginferencesabouttheaccuracyofthecompany’s

recordsbasedonthesample.“Accountingpopulationsarefrequentlyveryhighlyskewed

andtheerrorrateswhichtheauditorisseekingtodetectareoftenextremelylow”(Knight,

1979).Thispaperseekstofindamethodforchoosingasamplesizeofaccountsthatwill

provideamoreaccurateestimatethanthecurrentmethodsforsamplesizedetermination

thatarecurrentlybeingused.Inthispaper,allaccountsreceivablebookvaluesconstitute

thepopulation.Bookvalueisthevaluerecordedforaccountsorfinancialstatements.A

sampleisaselectionofsome,butnotall,oftheaccounts.Theinformationgatheredfrom

thesampledaccountsisusedtomakeinferencesaboutthepopulation.Theonlywaytoget

thetotalvalueoftheaccountsistoauditallaccounts,butthiswouldbeverycostly.Not

havingtoverifytheinformationonallaccountstomaketheseinferencesreducesthecost

incalculatingthequantityofinterest.Experiencehasshownthataproperlyselected

samplefrequentlyprovidesresultsthatareasgoodastheresultsfromverifyingall

accounts,(HigginsandNandram,2009).Dollarunitacceptancesamplingtoreduceneeded

samplesizeinauditingdataisthoroughlydiscussedinRohrback(1986).Ponemonand

Wendellstudiedthebenefitsofusingstatisticalsamplingmethodsinauditingdata,as

opposedtoanauditorusinghisexpertisetochoosethesample(PonemonandWendell,

1995).Althoughcostisnotspecificallyaddressedinthispaper,itprovidesmotivationfor

investigatingwhatsamplesizeisneededtogetefficientresults.

TheZero‐InflatedPoissonmodelwillbeintroducedinsection4,itisamodelto

accommodatecountdatawithexcesszeros.Ifacompanykeepsaccurateaccounts

receivable,thentherewouldbenoerrors,thismeansthattherewillbemorezerosinthe

datathanwouldbeaccountedforunderthestandardPoissonmodel.Therefore,wehope

getabetterestimateoftheappropriatesamplesize.In“MonetaryUnitSampling:

Improvingestimationofthetotalauditerror”(HigginsandNandram,2009),theZIP

methodisdicussedanditisshownthatforaccountingdataandothersimilardata

4

populations,thataboundundertheZIPmodelisreliableandmoreefficientthancommon

MonetaryUnitSamplingpractice.

RelatedResearch

Kaplan(1973)conductedsimulationstudiesbasedonhypotheticalpopulationsanderror

patternstoobservethebehaviorofratioanddifferenceestimatorswhenthepopulation

containsalimitednumberofzeros.Kaplanfoundastrongcorrelationbetweenthepoint

estimateandtheestimatedstandarderror,andshowedthatthenominalconfidencelevel

impliedbythenormaldistributionforlarge‐sampleconfidenceintervalswasfrequentlyfar

differentfromtheproportionofcorrectconfidenceintervals.

BakerandCopeland(1979)evaluatedtheuseofstratifiedregressionincomparisonto

standardregressioninaccountauditingdata.Aminimumof20errorsforthedifferenceof

bookvalueandauditvalueasanestimatorwasfoundtogivesuperiorresultsforthe

stratifiedregression.Aminimumof15errorsforaratioestimatorisneededforsuperior

resultsinstratifiedregression.Theusefulnessofthisinformationisquestionabledueto

thelowerrorrateofaccountingpopulations.

SahuandSmith(2006)exploredafullBayesianframeworkintheauditingcontext.They

foundthatnon‐informativepriordistributionsleadtoverysmallsamplesizes.Specifically,

ifthemeanofthepriordistributionisfarfromtheboundaryvalue(ortheperitem

materialerror),thenthesamplesizerequiredisverysmall.Inthiscase,thesamplesize

couldbesetbytheauditor.Whenthepriormeanisclosetothematerialerroralarge

samplesizeisrequired.

Berg(2006)proposedaBayesiantechniqueforauditingpropertyvalueappraisals.ABeta‐

Binomialmodelwasimplemented,andtheirprocedurerequiredsmallersamplesizes

relativetothosebasedonclassicalsamplesizedeterminationformulas.

5

Data

ThedatacomesfromLohr(1999)wheretherecordedvaluesb1,b2,b3,……bnforasampleof

=20accountsforacompanywerelisted,alongwithalltheaudited(actual)valuesa1,a2,

a3…..anofthesample.ThecompanyhadatotalofN=87accountsreceivable.Thetotalbook

valuefortheN=87accountsreceivableofthecompanywas$612,824.Thetotalbookvalue

ofallaccountsreceivableforacompanywouldbeB=b1+b2+…….+bN.Thetotalaudit

valueforacompany’saccountsreceivablewouldbeA=a1+a2+……..+aN.We’lldefinethe

errortobethedifferenceofthebookvalueandthetrueauditvalueforeachaccounti

(i=1,2,3,…N)as .Thismeansthatthetotalamountoferrorforthe

accountsis .Intheaccountingcontext,weexpectalargenumberof

accountstohave .Let betheerrorrateperdollar.Thereforetheerrorofeach

accountwillbe fori=1,…, .,andtheerrorrateperdollarforthesamplewillbe

willbe .Ourinitialestimatefor obtainedfromLohr’sdatais with

standarddeviation theseestimatesarethemeanandvariancecalculatedfromthe

sampleintheLohrtext.Arandomsampleof20accountswithreplacementwastakenfrom

thepopulationof87accounts.Thebookvalue,theauditvalue,andthedifferencebetween

thebookvalueandauditvaluefor16oftheaccountsarelistedinthetablebelow.The

sampleofsize20intheLohrtextwaswithreplacement,therewere4accountsthatwere

repeatedinthesampleintheLohrtext,andthese4accountswereremovedforthe

purposeofthispaper.

6

Table1:

Account Book Value Audit Value BV-AV

3 6842 6842 0

9 16350 16350 0

13 3935 3935 0

24 7090 7050 40

29 5533 5533 0

34 2163 2163 0

36 2399 2149 250

43 8941 8941 0

44 3716 3716 0

45 8663 8663 0

46 69540 69000 540

49 6881 6881 0

55 70100 70100 0

56 6467 6467 0

61 21000 21000 0

70 3847 3847 0

74 2422 2422 0

75 2291 2191 100

79 4667 4667 0

81 31257 31257 0

InitialExploration

Wefirstlookedatthesamplesizerequiredfor inahypothesistestwhilecontrollingfora

significancelevelof.05( )andpowerequalto.95( ).WechosebothPoisson

andBinomialmodelsforourinitialexploration.Thepoissondistributionisappliedin

countingthenumberofrareevents,whichinthecontextofauditingdatawearemodeling

theoccurrenceofthebookvaluenotbeingequaltotheauditvalue.Areasonablemodelfor

7

initialexplorationis , =1,…,N,whichimpliesthatthemeanofthe

differenceofthebookvalueandauditvalueisthebookvaluemultipliedbytheerrorrate.

TheBinomialmodelwasalsochosenforinitialexplorationbecausethebinomial

distributioniseasilyapproximatedbythenormaldistribution,andherethethoughtwould

bethatthebookvalue isthenumberofdollarsinaparticularaccountandeachdollar

hasprobability ofbeingmateriallymisstated.Thismodelwouldbeexpressed

by =1,…,N.Thebinomialdistributiondoesnotallowforthelarge

numberof thatwehaveinoursample,butasinitialexploration,theresultsunder

thebinomialmodelcanbecomparedwiththeresultsfromthepoissonmodelusingsimilar

methodstomakecomparisonsandhelparguethatourresultsarereasonable.(Sahuand

Smith,2006)investigatetheuseofthenormaldistributionwheretheassumptionsof

normalityarenotappropriate.Athoroughdiscussionofconfidenceintervalcriteriais

givenin(Jiroutek,Muller,KupperandStewart,2003).Usingdecisiontheorytoselectand

appropriatesamplesizeiscoveredintwopapersbyMenzefricke(1983and1984).In

sections1and2ofthispaperweusethepoissonandbinomialmodelsrespectivelyfora

frequentistapproximation(section1)andaBayesianmethodofapproximation(section2).

Section3brieflydiscussestheintervalfor fromtheposteriordistributionofthepoisson

model.AdiscussionofBayesianmodelperformancecriteriaisgivenin(WangandGelfand,

2002).Inourpaper,thisinitialexplorationismovingtowardstheintroductionoftheZero‐

InflatedPoissonmodelthatwillbediscussedinsection4.

1.FrequentistApproximation

Firstexplorationinvolveslookingatthesamplesizerequiredforconfidenceintervals

underboththePoissonmodelandtheBinomialmodel.Here,wechosetocontrolforthe

lengthoftheinterval,weallowedL(length)tobeintheinterval(.001,.02).Thisinvolved

usingestimatesfromthedataset.Aninitialestimateforthemeanerrorperdollar ,

called wascalculatedfromthesampletobe ,withastandard

deviation .Theformulasusedtocalculatethemeanandthevariancewere

8

and ,CasellaandBerger(2002).Theresultsfor

thesamplesize thatresultfromthisinitialexplorationundertherespectivemodelsin

sections1.1and1.2below.

1.1FrequentistApproximationfornunderthePoissonModel:

Here, =1,…, .

Equation(1)computesa( )%confidenceintervalfor .Here,

,(1)

where isusedtorepresentthez‐scoreforthe percentileofthenormal

distribution.However,wearenotlookingtocomputeaconfidenceintervalfor ,but

ratherwearelookingtohaveanintervalfortheappropriatesamplesize .Thedistance

ofthetwoendpointsofthisinterval, to isthelength.

Sinceourintervaliscenteredaround ,wecanuse2multipliedbythedistanceof to

tocalculatethelength.

Thisimpliesthatthelengthoftheintervalresultingunderthismodelcanbedescribedby

thefunction

.(2)

9

Thealgebratodemonstratetheintermediatestepscanbefoundinsection1.1ofthe

appendix.

Wealsoknowthattheexpectationof and basedonthePoissondistributionwillbe

and .Fillingthesevaluesoftheexpectationinfor and

weareabletoaccountfornothavingthisinformation.Aftersubstitutingthese

expectationsintoourequationfor ,andsolvingfor ,theresultis

.(3)

InFigure1below,theresultsforthesamplesizeversuslengthareillustrated.

Figure1:

FrequentistApproximationofSSUnderThePoissonModel

n n

L

10

Figure1demonstratesthatforanintervallengthlessthan0.007thesamplesizeisincreasingas

thelengthbecomessmaller.ALengthoftheintervalgreaterthan0.007wouldrequireasample

sizeof1.

1.2FrequentistApproximationfornundertheBinomialModel:

Usingthesamemethodasabove,thesamplesizeiscalculatedundertheBinomialmodel.

Weseektohaveanestimateforthesamplesize thatissimilartotheestimatewe

calculatedunderthePoissonmodel.Here,

=1,…, .

UndertheBinomialmodel,equation(4)showstheintervalfor ,

.(4)

ThisimpliesthatthelengthoftheintervalundertheBinomialmodelisexpressedasin

equation(5)is

.(5)

Intermediatestepsarefoundinsection1.2oftheAppendix.Wealsoknowthat

and ,thenwesubstitutetheseexpectations

intoourequationfor and .Asbefore,intermediatealgebraicstepscanbefoundin

11

section1.2oftheAppendix.Aftersolvingfor ,wegetthefollowingequationfor ,where

isrepresentedby

,(6)

and,

.(7)

Ourworkwasverifiedbyusingasimilar,simplermethodforthebinomialmodelabove,

becausethealgebratoarriveatthisresultwasextensive.Section1.3isthegeneraloutline

ofanotherapproachunderthebinomialmodelandthismethodalsoresultedinasimilar

estimatedsamplesize.

1.3FrequentistApproximationtothesimplerBinomialmodel:

If , =1,…, ,thisimpliesthatthesumofthe willbedistributedas

.

Wehave, .

Aconfidenceintervalisfirstsetupfor .However,tosimplifycalculationswechoosetolet

.Therepresentationofthisisgivenas,

,(8)

where ,then

12

.(9)

Calculationssimilartothoseoftheintervalsfor resultintheformulaforlengthgivenin

equation(10),

.(10)

Sinceweknowthat wecancontinuesimilarlytothe

previouspoissonandbinomialintervals,bysubstitutingintheexpectationforX.The

resultsaresimilartothepreviousestimationofsamplesizeinthemodelsabove.Also

includedisFigure3thatdisplaystheratioofestimatedsamplesizeversusthelengthofthe

interval.

Figure2:

FrequentistApproximationofSSUnderTheBinomialModel

n

L

13

Figure2hasvaluesofsamplesizethatareslightlygreaterthanthevaluesofsamplesizeinFigure

1foragivenlength.Howeverthesevaluesaresimilar,andthisismoreeasilyexpressedinthe

ratioplotFigure3.Itwouldbeexpectedthatthevaluesofsamplesizeforcorrespondinglength

wouldbesomewhatsimilarunderthepoissonandbinomialmodels.

Figure3:

ItcanbenoticedbycomparisonofFigure1andFigure2,thattheestimatedlengthofthe

PoissonintervalissmallerthantheBinomialintervalforanysamplesize,untilalengthof

.013.Theratioofsamplesize,displayedatFigure3ofthetwomodelsisabout1aftera

lengthof.013,whichmeansthatatalengthof.013therequiredsamplesizeisthesame.

2.BayesianMethodsforSampleSizeDetermination

Next,wewilltakealookatBayesianmethodsofapproximation.LetY(n)=(X1,X2,…..Xn)be

arandomsampleofsizen,wewilllookatthehighestposteriordensityregionundertwo

models.Inpart2.1assumingapopulationwithdensity , =1,…, ,

andpriordistribution ofunknown ,wewillcalculatethePosterior

RatioofSampleSizevs.Length

L

14

distributionfor andthencontinuetofindanestimateforthesamplesize usinga

normalapproximationtotheposteriordistribution.McCray(1984)proposedaBayesian

modelforevaluatingdollarunitsamples,evenifaninformativepriorprobability

distributionontheexpectedtotalerrorisnotavailable.Insection2.2themodelassumeda

populationdensity =1,…, ,withprior .Underthe

binomialmodelwewillproceedbyagainfindingtheposteriordistributionfor andthen

usinganormalapproximationtofindanintervalfor .

2.1CalculatingthePosteriorDistributionfor underthePoissonModel:

InBayesianstatistics,theposteriordistributionisequaltothepriordistributiontimesthe

likelihood.WehavealreadystatedthatthelikelihoodunderthePoissonmodelis

, =1,…, ,andweareusingprior .Thisimpliesthat

theposteriordistributionfor willthereforebe .Giventhat

wehaveanestimatefor , fromourdata,withstandarddeviationestimate

,wecanusethisinformationtosolveforourinitialestimatesof and .

Sincethepriorfollowsagammadistribution,wecansetupequationsforthemeanand

varianceas ,and .Aftersolvingfor and , ,

and .Intermediatestepsarefoundinsection2.1oftheAppendix.Afterfilling

thesevaluesfor and ,theposteriordistributionwillbedistributedas

.However,wewillstillneedinformationfor ,and

isunknown.

2.2Solvingforn,usingtheNormalApproximationforthePosteriorof underthe

PoissonModel:

15

Underthepoissonmodel,theposteriordistributionfor givenyisapproximatelynormal

withmean andvariance .Themeanandvariancewere

easilyderivedbyusingtheformulaforthemeanandvarianceofaGammadistribution.

Thus,

, =1,…, .(11)

Wewanttochooseasamplesizefor in sothatwehaveatleast

100( )%confidenceforthetruedifferencebetweenthebookvalueandtheaudited

value,givenaspecifiedlengthandtotalbookvalue.Theintervalfor isfrom

to ,becauseListhetotallengthoftheinterval,and

thenormaldistributionissymmetric.Thefollowingequationisforfindingthesmallest

areaunderourmodelthatisatleast100( )%confidentfor .Theintegration

involvesaveragingover sothatwearenolongerdealingwiththeposterior

distribution ,butratherafunctionof .Theformularepresentationforaveragingover

isgiveninequation(12),

.(12)

Insection3,theappropriatesamplesize iscalculatedfortheposteriordistributionof giventheinformationfory.However,inthispartofthepaperweareaveragingovery,so

thatthisintervaliscalculatedfor usingsimilarinformationtotheintervalgivenin

section1.Theposteriordistributionof isapproximatelynormal,sowecanmakeuseof

thepropertiesofthenormaldistributiontosimplifythecalculations.Thus,

16

.(13)

Herethemeancancelsinthenumerator,whichleadstothemuchmoresimplifiedlooking

equation(14),

.(14)

MonteCarlointegrationwasusedtogettheestimatefor ,usingthefactthat

, =1,…, ,samplesizesbetween100and500wereused,andwitheach

samplesize ,10,000simulationsweredrawn.Theoptimalvaluefor wasfoundtobe

=16.Similarresultswerefoundundertheposteriordistributionof undertheBinomial

model,andarecalculatedandcomparedinsection2.4.

2.3CalculatingthePosteriorDistributionfor undertheBinomialModel:

Forthe , =1,…, ,modelwithprior ,theposterior

distributionisagainthepriortimesthelikelihoodandresultsina

distribution.Again,theposteriordistributiondependson

parametersthatwecanestimate.Tosolvefortheparametersofthebetapriorwewilluse

theknownequationsforthemeanandvariance,alongwithourinitialestimatesfor and

.Themeanofthebetaprioris andthevarianceofthebeta

distributionisgivenby .Solvingthissystemofequations

17

theresultingparameterestimatesare and .Detailedcalculationof

meanandvariancearegiveninsection2.3oftheAppendix.Afterfillingintheseestimates,

theresultingposteriordistributionis .Dueto

theunknowninformationfor ,lookingatthenormalapproximationtothebetaposterior

isagainofinterest.


BinomialModel:

Underthebinomialmodel,theposteriordistributionfor givenyisapproximatelynormal

withmean andvariance .The

meanandvariancewereeasilyderivedbyusingtheformulaforthemeanandvarianceofa

betadistribution.Thus,

,=1,…, .

(15)

Again,proceedingaswedidinsection2.1above,wewanttochooseasamplesizefor in

sothatwehaveatleast100( )%confidenceforthetruedifferencebetweenthe

bookvalueandtheauditedvalue,givenaspecifiedlengthandtotalbookvalue.The

intervalfor isfrom to .Thus,

,=1,…, .

(16)

18

Again,weareproceedingwiththesamemethodsasusedaboveforthegammaposterior

distribution,byusingknownpropertiesofthenormaldistribution.Here,

,

(17)

canbewrittenas,

.(18)

Herethemeancancelsinthenumeratorjustaspreviouslyseeninthegammaposterior

distribution.Thiscancelationleadstothemuchmoresimplifiedlookingequation(19),

.

(19)

UsingMonteCarlointegrationtogettheestimatefor ,usingthefactthat

, =1,…, ,samplesizesbetween200and600wereused,and

witheachsamplesize ,10,000simulationsweredrawn.Theoptimalvaluefor was

alsofoundtobe =16,thiswasthesamesamplesizethatwasfoundtobeoptimalunder

thepoissonmodel.

Figures4and5belowareplotsofsamplesizevs.varying confidencelevels.Thiswas

foundusingtheformulaforthenormalapproximation ,

underthetwodifferentmodelsrespectively.BothFigures4and5areforfixedL,L=0.001

and , =7043.

19

Figure4:

Figures4and5demonstratethatthe95%confidencelevelisachievedforaninteger

samplesizeof16underboththenormalapproximationtothepoissonmodelandthe

binomialmodelwiththelengthheldataconstant.001andtheaveragebookvalue,

=7043.

Figure5:

SSDForNormalApproximationUnderThePoissonModelPosterior

SSDforNormalApproximationofSSUnderTheBinomialModelPosterior

n

n

20

Figures6and7demonstrateasequenceofvaluesforlengthvaryingfrom.001to.022,

wherethesamplesizeisthesmallestintegervaluethatsolvestheintegralatgreaterthan

.Soalthoughthe valuesforeachsamplesizeversuslengtharenotthesame,the

integervaluethatsolvesfortheequationisthesameinbothFigures6and7.Thisalso

meansthattheratioofsamplesizebetweenthetwodifferentmodelswillbe1forallvalues

oflength.

Figure6:

Figures6and7havethesameshapeasFigures1and2.Thisimpliesthatthesamplesize

vs.lengthunderthefrequentistmethodusedandtheBayesianmethodareproducing

similarresults.Figure7followsonthenextpage,howeveritiseasilynoticedthatFigures

6and7arequitesimilar,andthisisduetointegervaluesforsamplesizebeingusedas

input.

NormalApproximationUnderPoissonModelvs.Length

n

L

21

Figure7:

AsthelengthgetssmallerthanL=0.002inbothFigures6and7,thesamplesizerapidly

increases.

Section3:

BayesianEstimatesforthePosteriorDistributionof

Simplyusingtheposteriordistributionfromthenormalapproximationtothepoissonmodelin

section2.1,weareabletosetuptherelationshipbelow,

.

Theestimates.0996and14.29areestimatesfor and respectivelyfromthepriordistribution

.Theestimatesfor and arecalculatedinsection2.1oftheAppendix.Simple

algebraicmanipulationoftheaboveformulaleadstothefollowinginequalityfor ,

NormalApproximationUnderBinomialModelvs.Length

n

L

22

.

InTableAbelow,ifintheaboveinequalityfor we

allowthevaluefor tovary,whilesetting =4226asaninitialestimate,holding

asfixed,andL=.001asfixedthecorrespondingvaluesforsamplesizearegiven.

InTableBbelow,similarlyifweusetheinequality and

hold constantat1.96andallowlengthtovary,thecorrespondingvaluesforsample

sizearegiven.Foravalueof slightlygreaterthan1.96,thechoiceofconstantvaluein

sections1and2,wehaveacorrespondingsamplesizeof36.55.Thisvalueislargerthan

thenominalsamplesizeunderthefrequentistmethodandpreviousBayesianmethod

wherethevaluesof wereaveragedover.Also,itisnoticedthatasthelengthvaries

(TableB)thevaluescloselymimicthevaluesdepictedbyFigures1and6,thesefigures

correspondtothefrequentistpoissonapproximationandtheBayesianpoisson

approximationrespectively.

23

TableA: TableB:

Sample Size Length

Sample Size

1.28 23.63 0.001 36.18 1.38 25.47 0.002 18.09 1.48 27.32 0.003 12.06 1.58 29.16 0.004 9.04 1.68 31.01 0.005 7.23 1.78 32.86 0.006 6.03 1.88 34.7 0.007 5.17 1.98 36.55 0.008 4.52 2.08 38.4 0.009 4.02 2.18 40.24 0.01 3.62 2.28 42.09 0.011 3.29 2.38 43.93 0.012 3.01 2.48 45.78 0.013 2.78 2.58 47.63 0.014 2.58 2.68 49.47 0.015 2.41 2.78 51.32 0.016 2.26 2.88 53.16 0.017 2.12 2.98 55.01 0.018 2.01 3.08 56.86 0.019 1.9

OnthefollowingpagearetheFigures8and9forvaryinglengthvs.samplesizeforthe

posteriordistributionof underthetworespectivemodels.

24

Figure8:

Figures8and9alsohavethesameshapeasFigures1and2.Figure8(theBayesian

poissonmethod)hasslightlylargervaluesthanFigure1(thefrequentistpoissonmethod),

however,Figure8usesinformationforythatwedonothaveavailabletousebeforethe

sampleischosen.ThisissimilarlythecaseifwecompareFigure9(theBayesianbinomial

method)toFigure2(thefrequentistbinomialmethod).

Figure9:

NormalApproximationtoPosteriorDistributionof inThePoissonModel

NormalApproximationtoPosteriorDistributionof inTheBinomialModel

nn

L

L

25

4.ZeroInflatedPoissonModel

Zero‐InflatedPoisson(ZIP)isamodeltoaccommodatedatawithexcesszeros.Itassumes

withprobability theonlypossibleobservationis0(oratruezero),andwithprobability

a =1,…, ,randomvariableisobserved.Ifacompanykeepsaccurate

accountsreceivable,therewillbenoerrors,andthisimpliesthattherewillbemanyzeros.

AlthoughthePoissondistributionincludeszero,therearemorezerosinthedatathan

appropriateforusingthestandardPoissonmodel.TheZIPmodelwillproduceamore

preciseestimate.AthoroughexplanationoftheZIPmodel,fittingZIPregressionmodels

andthesimulatedbehavioroftheirpropertiesinmanufacturingdefectdataisgivenin

(Lambert,1992).

IntheZIPmodel,theresponsesZiareindependentand=1,…, .

Ourmodelfor underthezeroinflationisgivenby .Belowisatable

forthejointdistributionforZiand .Thejointdistributiondefinestheprobabilityof

eventsintermsofbothZiand .

0

ThelikelihoodfunctionundertheZero‐InflatedPoissonmodelisthereforegivenby

equation(20),

,=1,…, .

(20)

26

Underthismodel, and arenuisanceparametersandneedtobeestimated.Estimatesfor and

,namely and canbeestimatedby where isthecovariancematrix

of and ,andiscalculatedbasedonthesecondpartialderivativeswithrespectto and .The

secondpartialderivativesaregivenasequations(21),(22)and(23).Detailedstepswhichledto

theseresultsaregiveninsection4oftheAppendix.Thesecondpartialderivativesaregivenby,

,

(21)

,

(22)

and

.

(23)

ThesearetheelementsoftheHessianmatrix,andthesewillbeusedtosolveforthe

covariancematrix and bytakingtheoppositeoftheHessianmatrixandthensolving

fortheinverse.Oncewehavedeterminedthecovariancematrix,onlythe(1,1)elementof

thematrixisofinterest,becauseitisthevarianceoftheta.Detailedstepscanbefoundin

section4oftheAppendix.Theestimateof isapproximatelynormal,andisgivenby,

,

27

where .

(24)

Weneedtogetestimates and forthenuisanceparameters and beforewecan

continue.Numericalmethodswereusedduetothecomplexityoftheproblem.TheEM‐

algorithmyieldedestimatesof =.78882and =.01113,andtheNelder‐Meadmethod

yieldedestimatesof =.79991and =.01143.Thiswascalculatedbyfirstfinding

estimatesforthe ’s.Findingestimatesforthe ’swasnecessarybecause isan

indicatorvariablethereforethesequencewouldnotconvergewellifitwerenotfirst

estimatedbytheequations,

and .

The ’sfollowaBernoullidistribution ,=1,…, .

Theexpectationforthe ’saregivenbytheequations,

and .

Theexpectationfor canthenbepluggedintotheequationsfor and ,andthenthe

numericalmethodswereperformedandresultedintheestimatespreviouslystated.After

obtainingestimatesforthenuisanceparameters,theEMmethodispreferredtotheNelder‐

MeadmethodsotheEMestimateswereusedincalculatingthevariance. isdistributed

as

28

.

Wecallthisvariance“a”anduseourestimates , , ,

tosolvefor ,weusethisfactorin =1.687012toinvestigatethe

samplesizegivenintheZIPmodelincomparisontothesamplesizeunderthestandard

poissonmodel.Thiswaywewillhaveresultssimilartothosethatweregivenunderthe

frequentistpoissonmodel,buttheywillbescaledbyafactor tomakeforeasein

comparisonofthetwomodels.Supplementarycalculationscanbefoundinsection4ofthe

Appendix.Thecalculationsforthesamplesize undertheZIPmodelfollowsimilarlyto

thefrequentistmethodusedinsection1,butbeginningwith

.

(25)

Multiplyinganddividingthroughby

sothatwewillhaveasamplesizethatis

easilycomparedtothePoissonfrequentistmodelresultsin

.

(26)

Anintervalfor

willbe

.

(27)

Aftersomealgebraicmanipulation,withappropriateintermediatestepsfoundinsection4oftheAppendix,thelengthoftheintervalundertheZIPmodelis

29

.

(28)

Afterrearrangingthelengthequationtosolvefor ,weareleftwithequation(29),

.

(29)

Thesupportingalgebraisfoundinsection4oftheAppendix.Itisnoticedthattheinterval

for undertheZero‐InflatedPoissonmodelisverysimilartotheintervalforunderthe

frequentistPoissonmodel.Theonlydifferencebetweenthetwoequationsforisafactor

a*.Figure10belowdisplaystheZero‐InflatedPoissonmodelcomparingthesamplesize

vs.length.

Figure10:

SSDUnderTheZero‐InflatedPoissonModel

L

n

30

ThegraphfortheZIPmodel(Figure10)hasaslightlyhighersamplesizeforanygiven

lengththanunderthepoissonfrequentistmodel(Figure1).UndertheZIPmodelwe

expecttheretobeincreasedprecision,however,thisincreasedprecisionrequiresalarger

samplesizetoattain.

Conclusion:WeproposedamethodusingtheZero‐InflatedPoisson(ZIP)modelwhichexplicitlyconsiders

zeroversusnon‐zeroerrors.Thismodelisfavorableduetotheexcesszerosthatarepresentin

auditingdatathatthestandardpoissonmodeldoesnotaccountfor.Thismethodcouldeasilybe

extendedfordatasimilartoaccountingpopulations.Theestimatednecessarysamplesizewas

largerundertheZIPmodelthanunderthestandardpoissonmodel.However,duetotheexcess

zerosinthedataset,itwouldbereasonabletoassumethatthelargersamplesizeisnecessaryfor

increasedprecision.Furtherresearchandinvestigationisneededtoexaminemorepreciselythe

benefitsundertheZIPmodel.Belowisatablesummarizingtheresultsofthemethodsfromthe

foursectionsofthispaperwhenthelengthisL=0.001,

€

α isfixedat , ,and

€

b = 7043.

Poisson Binomial ZIP Frequentist 15.29 22.72 41.02 Bayesian 16 16 Bayesian 36.18

Theratioofthepoissonfrequentistsamplesizetothebinomialfrequentistsamplesizeis0.67.

Figure3insection1ofthepaperdepictedthefrequentistbinomialsamplesizebeinglarger,until

thelengthoftheintervalreached0.013.Theposteriordistributionof forthepoissonmodel

hadanominalsamplesizeof

€

n = 36.18,thiswasthevalueclosesttoourZIPsamplesizeestimate

of

€

n = 41.02.However,asstatedinsection3ofthispaper,informationforywasusedthatwould

notbeavailablebeforedecidingthenumberofaccountstoselectforthesample.Theestimates

forthesamplesizeundertheBayesianmethodwhereMonteCarlointegrationwasimplemented

resultedinasamplesizeof16underboththepoissonandbinomialmodels.Thiswasbecausewe

31

chosethefirstintegersamplesizewithalengthequalto0.001tohavegreaterthan.95confidence

asouroptimalsamplesize.

32

REFERENCES

Baker,RobertL.,Copeland,RonaldM.,(1979).EvaluationoftheStratifiedRegressionEstimatorforAuditingAccountingPopulations.JournalofAccountingResearch,17:606‐617.Berg,Nathan.(2006).ASimpleBayesianProcedureforSampleSizeDeterminationinanAuditofPropertyValueAppraisals.RealEstateEconomics,34:133‐155.Casella,G.,Berger,R.L.,(2002).StatisticalInference2ndEd.DuxburyPress.Cockburn,I.M.,Puterman,M.L.,andWang,P.(1998).AnalysisofPatentData–AMixed‐Poisson‐Regression‐ModelApproach.JournalofBusiness&EconomicStatistics,16:27‐41Higgins,H.N.,Nandram,B.,(2009).MonetaryUnitSampling:ImprovingEstimationoftheTotalAuditError.AdvancesinAccounting,IncorporatingAdvancesinInternationalAccounting,25,2:174‐182Jiroutek,M.R.,Muller,K.E.,Kupper,LawrenceL.,Stewart,P.W.,(2003).ANewMethodforChoosingSampleSizeforConfidenceInterval‐BasedInferences.Biometrics,59:580‐590.Kaplan,RobertS.,(1973).StatisticalSamplinginAuditingwithAuxiliaryInformationEstimators.JournalofAccountingResearch,11:238‐258.Knight,P.,(1979).StatisticalSamplinginAuditing:AnAuditor’sViewpoint.JournaloftheRoyalStatisticalSociety,28:253‐266.Lambert,D.,(1992).Zero‐InflatedPoissonRegression,WithanApplicationtoDefectsinManufacturing.Technometrics,34:1‐14Lohr,S.L.,(1999).Sampling:DesignandAnalysis.NewYork:DuxburyPress.McCray,JohnH.,(1984).AQuasi‐BayesianAuditRiskModelforDollarUnitSampling.TheAccountingReview,LIX,No.1:35‐41.Menzefricke,U.,(1983).OnSamplingPlanSelectionwithDollar‐UnitSampling.JournalofAccountingResearch,21:96‐105.Menzefricke,U.,(1984).UsingDecisionTheoryforPlanningAuditSampleSizewithDollarUnitSampling.JournalofAccountingResearch,22:570‐587.Ponemon,LawrenceA.,Wendell,JohnP.,(1995).JudgementalVersusRandomSamplinginAuditing:AnExperimentalInvestigation.Auditing:AJournalofPracticeandTheory,14:17‐34.

33

Rohrback,KermitJ.,(1986).MonetaryUnitAcceptanceSampling.JournalofAccountingResearch.24:127‐150.Sahu,S.K.,Smith,T.M.F.(2006).ABayesianMethodofSampleSizeDeterminationwithPracticalApplications.JournaloftheRoyalStatisticalSociety,169:235‐253.Wang,F.,andGelfand,A.E.,(2002).ASimulation‐BasedApproachtoBayesianSampleSizeDeterminationforPerformanceUnderaGivenModelandforSepartingModels.StatisticalScience,17:193‐208

34

APPENDIX

Section1:1.1FrequentistApproximationtothePoissonmodel:

,i=1,…, .

(1)

Thelengthoftheintervalforthetawillbe:

(2)

isestimatedby

Aftersubstitutinginfor ,wecansolveforn:

35

(3)

1.2FrequentistApproximationtotheBinomialmodel:

,i=1,…, .

(4)


(5)

Andtheexpectationof Thisimpliesafterfillinginfor and that:

36

Aftermultiplyingthroughtopandbottombyn2theresultingequationis:

Simplyrearrangingtheequationandtakingthesquarerootofbothsidesresultsin:

Wewillcallthesquarerootterm“a”

(6)

Andwecontinuesolvingtheequationforn:

37

(7)

Andproceedusingasimpleiterativealgorithmtogetvaluesfornforvaryinglengths.

1.3FrequentistApproximationtothesimplerBinomialmodel:

(8)

Let

(9)


38

Takeout2n*Zfromundertheradical,then*inthenumeratorthencancelswiththen*in

thedenominator.

(10)

Nowtouse asanestimateforX,weknowthat

Andaftersubstitutinginwewouldhave

However,abetterapproximationwouldinvolve:

Andaftersubstitutionweareleftwith

Let ,andaddandsubtract1fromtheleft‐handsideoftheequation:

39

Thensquarebothsides:

Thisimplies:

Section2:2.1BayesianNormalapproximation,underthePoissonmodeltotheGammaPosteriorDistribution:

,i=1,…, ,whichcanbewrittenas i=1,…, .InBayesianstatistics,thisiscalledthelikelihood.

40

Weassumepriordistribution ThePosteriordistribution isproportionaltothePrior*Likelihood

Solvingfor and giventhat and :

nowsubstituteintosolvefor :


PoissonModel:

41

(11)

(12)

(13)

(14)

2.3BayesianNormalapproximation,undertheBinomialmodeltotheBetaPosteriorDistribution:

,i=1,…, .

ThePosteriordistributionagainisgivenby:

Solvingfor and giventhat and

42

Simplysolvingthesystemofequationsresultsin =.09271and =13.1514.


BinomialModel:

(15)

43

(16)

(17)

(18)

(19)

Section4:ZeroInflatedPoissonmodel:FirstSolvingforthesecondpartialderivativesofthelikelihoodfunction:

Thelikelihoodfunctionis

(20)

44

(21)

Let thenbythequotientrule:

(22)

Let thenbythequotientrule:

45

(23)

TheHessianMatrixwillthereforebe:

WefirsttakethenegativeoftheHessianMatrix:

Next,toobtaintheinverseofthismatrix:

46

The(1,1)elementofthismatrixwillbe:

(24)Wecanuse ,aftersubstitutingthisintotheequation:

Although ,itisapproximately,sowewilluse tosubstituteintoourequations.

Usingcommondenominator

47

Wecallthis“a”anduseourestimates , , , tosolve

for ,weusethisfactorin =1.687012toinvestigatethesamplesize

givenintheZIPmodelincomparisontothesamplesizeunderthestandardPoissonmodel.

(25)

(26)

(27)

Theformulaforthelengthwillbe:

(28)

48

Theintervalfornwillbe:

(31)

You’llnoticethatthisisverysimilartotheintervalforninthefrequentistPoissoncase,exceptisitscaledbyafactorof“a”.

sample size determination in auditing accounts receivable using

Documents