fpe 90min-all
TRANSCRIPT
1
A Computing Bias in Estimating theProbability of
Informed TradingHsiou-Wei W. Lin, Wen-Chyan
KeJournal of Financial Markets,
forthcoming
2
Introduction(1/5)
Since Easley et al. (1996; 2002), who proposed a structural model to estimate the probability of informed trading (PIN) with MLE, the PIN has been widely employed in market microstructure studies.
To obtain PIN using numerical MLE, researchers need to count the daily numbers of buyer- and seller-initiated trades (or buys and sells) for a stock.
Large buys and sells may trigger the power function embedded in the likelihood to generate a numerical value that exceeds the range of real number values handled by the a computer program.
Such a phenomenon of overflow or under flow is referred to in computing science as the floating-point exception (FPE)
Introduction(2/5)
4
Introduction(3/5)
The floating point is the most common representation today for real numbers by digital computers.
Its effective range is approximately ±10308.25.
For example, computing exp(710), which exceeds 10308.25, results in an overflow (i.e., an FPE).
5
Introduction(4/5) FPE narrows the set of feasible solutions for
the optimization problem of MLE and may, in fact, eliminate estimates for the actual parameters.
FPE appears to have had more impact on PIN estimates as trading has become more frequent in recent year. Easley et al. (2005) do not have PIN estimates for 3.6%
stocks in their 2001 sample. Yang and Zhang (2006) report that 3.8 % stocks lack PIN
estimates in the their 2004 sample.
FPE bias may also overstate the relation between the PIN estimates and trading frequency (e.g. the daily number of trades).
We propose a remedial approach, reformulating the likelihood function.
Introduction(5/5)
7Figrue: Tree diagram of the trading process (Easley et al., 2002).
Buy Arrival Rate:b
Sell Arrival Rate:s+
Buy Arrival Rate:b+
Sell Arrival Rate:s
Buy Arrival Rate:b
Sell Arrival Rate:s
Signaling Good News
with Probability (1-)
Signaling Bad News
with Probability
Information Event Occurs
with Probability
Information Event Does Not Occur
with Probability (1-)
8
PIN Estimation(1/2)
On day i, the joint probability density function of (Bi, Si),
the observed numbers of buys and sells, is specified by
f(Bi, Si|(, , , b, s))
exp( b)
bBi
Bi! exp( (s + ))
(s + )Si
Si!
(1 ) exp( (b + ))
(b + )Bi
Bi! exp( s)
sSi
Si!
(1 ) exp( b)
bBi
Bi! exp( s)
sSi
Si!.
9
PIN Estimation(2/2) The estimate of from MLE, denoted as , , , b, s),
is obtained by solving the following problem:
TBFS
BL Maximize
, (1)
where TBL
I
i
iiB S,BL1
I
iii S,Bf
1
log ;
T((B1, S1), (B2, S2), (B3, S3),…, (BI, SI));
and BFS 0, , and 10 , , , sbsb ,,,, ,
the boundary constraint.
The PIN estimate is PIN b s
, which is the ratio of
mean informed trade to mean total trade.
10
Bias and Remedy(1/8) Given T and TBL , there exists a subset of BFS:
TLBBFS FPE the toleadnot does TBFS BL ,
where TBL can be accurately expressed as a floating-point number
in the computing process.
The boundary of TLBBFS is varietal with respect to T and TBL .
If we obtain a (local) solution to problem (1) via the numerical method
with FPE, we are, in practice, solving the following problem (2):
TBFS
BL MaximizeTBL
. (2)
11
Bias and Remedy(2/8) In the presence of FPE, problem (2) deviates from problem (1). Depending on T
and TBL , the relation between and LBT—the solutions to (1) and (2),
respectively—is complicated. Larger Bi and Si, which are likely to be observed
during periods of more frequent trading, correspond to a smaller TLBBFS and
thus a more pronounced difference between and LBT.
If there is no solution to problem (2), FPE causes a selection bias as in prior
studies (e.g., Easley et al., 2002, 2005; Yan and Zhang, 2006).
If a numerical solution LBT to problem (2) is obtained, LBT is in TLBBFS
but deviates from , which is the actual parameter vector and may be in
BFS\ TLBBFS . Such a situation leads to a bias in the estimate of and thus a
downward bias in PIN estimate.
Bias and Remedy(3/10) - A simple example Let b s and rewrite iiB S,BL as follows:
ii S,BL log exp )1log(exp iS
+(1 ) exp )1log(exp iB
+ )1( +(Bi+Si)log() 2 log(Si!Bi!). (S1)
With TL
I
i
ii S,BL1
, we derive:
TLBFS FPE the toleadnot does TBFS L
EB,S
EB,S
maxmax
maxmax
)1)log(max(
and ,)1)log(max(
BFS
710)1)log(max( EB,S maxmax BFS , (S2)
where Smaxmax{Si,i1,2,3,…,I}, Bmaxmax{Bi,i1,2,3,…,I}, and
Emin {e0| Input value e leads to overflows for exponential function exp()}.
Si = exp(Silog())
Bias and Remedy(4/10) - A simple example Rewrite ii S,BL as ii S,BL
~ :
ii S,BL~ log )1log(exp iS +(1 ) )1log(exp iB
+ )1( +(Bi+Si)log() 2 log(Si!Bi!). (S3)
Let TL~
I
i
ii S,BL~
1
. Then we obtain TL~BFS analogous to TLBFS :
TL~BFS FPE the toleadnot does TBFS L
~
E)B,S maxmax )1log(max( BFS . (S4)
TLBFS EB,S maxmax )1)log(max( BFS
exp(-)exp(Silog(1+ /)) exp(-)exp(Silog(1+ /))
Bias and Remedy(5/10) - A simple example Assume that the unbiased estimate BFS\ TLBFS is the solution to (1) and that L T is the
solution to (S5) and thus naturally in TLBFS .
TBFS
L MaximizeTL
. (S5)
Assumes that the estimates of and are the same in and L T because the estimation of the two
parameters is not the main factor leading to the overflow given bounds of zero and one.
With (, , , , )BFS\ TLBFS and (L T, L T, L T, L T, L T) TLBFS , we obtain:
log(1+L T/ L T) )max( maxmax B,SE log(1+/ ).
Thus,
PINL T1
1+2 L T/(L TL T) 1
1+2 /()PIN
because L T with the assumptions as stated.
TLBFS EB,S maxmax )1)log(max( BFS )max()1log( maxmax B,S/E BFS
15
Bias and Remedy(7/10) TIL
I
i
iiI S,BL1
, which is proposed by Easley et al. (2005); iiI S,BL is obtained by
reformulating iiB S,BL as
iiI S,BL iiiiiiiiii MSs
MBb
MSs
Mb
Ms
MBb xxxxxx )1()exp()1()exp(log
Bilog(b)Silog(s) (bs)Mi (log(xb)log(xs)) log(Si!Bi!),
where Mi min(Bi, Si)max(Bi, Si)/2, b
bbx
, and s
ssx
.
Construct ETLIBFS FPE the toleadnot does TBFS IL analogous to TLB
BFS :
ETLIBFS
Ii
ExxM
xSxB
xS
xB
ExxM
sbi
sibi
si
bi
sbi
..., 3, 2, 1,
,)log()log(
)log()log(
and ),log(
),log(
min
,)log( ),log(min
BFS ,
where E > 0 denotes the minimum of numbers leading to the overflow for the exponential function
exp() in the computing process.
ETLIBFS
Ii
ExxM
xSxB
xS
xB
ExxM
sbi
sibi
si
bi
sbi
..., 3, 2, 1,
,)log()log(
)log()log(
and ),log(
),log(
min
,)log( ),log(min
BFS
16
Bias and Remedy(8/10) Whith the following L
TLIBFS and U
TLIBFS , we could argue that TIL results in
downward-biased PIN estimates for stocks with a large number of trades.
ETLIBFS L
TLIBFS ExxM sb )log()log(maxBFS and
ETLIBFS U
TLIBFS ExxM sb ))log( ,)log(max(maxBFS ,
where Mmaxmax{Mi, i1, 2, 3, … , I}.
This item is less than zero.
dominate
Mmax log(xb) and log(xs) 0 xb=b/(+b) and xs=s/(+s) 1 0 PIN =1/(1+ /(b+ s)) is underestimated.
17
Bias and Remedy(9/10) To reduce the FPE bias, we first present the accurate likelihood expression TAL
I
i
iiA S,BL1
, where iiA S,BL is as follows:
iiA S,BL log exp(e1i emaxi) (1 )exp(e2i emaxi) (1 )exp(e3i emaxi )
Bilog(b ) Silog(s ) (b s) emaxi log(Si!Bi!),
where e1i Bilog(1/b), e2i Silog(1/s), e3i Bilog(1/b) Si
log(1/s), and emaximax(e1i,e2i,e3i).
Construct ETLABFS analogous to TLB
BFS as follows:
ETLABFS FPE the toleadnot does TBFS AL
Ii,E)ee,ee,ee imaxiimaxiimaxi ..., 3, 2, 1, max( 321 BFS BFS,
where e1i emaxi, e2i emaxi and e3i emaxi are always negative, and are therefore less than
E0 for each BFS.
The value in the log function is great than 1.
The value is less than 0.
exp(x)bycz = exp(x+ylog(b)+zlog(c))
18
Bias and Remedy(10/10) Based on TBL , TAL is derived from the following two
principles:
In computing exey (or xey), the expression of exy (or
sign(x)elog(|x|)y) is more stable than that of exey (or xey). For
example, exeye800e 400 and ex+ye400 when x800,
y 400.
In computer arithmetic, the absolute computing error of a
function f(x) increases with |f'(x)|. Therefore, a large input
value for exp() and, conversely, a small positive input value
for log() should be avoided. Namely, log(e(xy) m ez m)m
with mmax(xy, z) is more accurate than log(ex+yez) in
computing. For example, x800, y 400, z00.
Notaions In the following sections, we compare the two different
likelihood expressions TIL and TAL based on both simulation and historical data.
To distinguish between estimates with TIL and TAL , we denote the estimate using TAL as A
(A , A , A , b,A , s,A ) and the estimate using TIL as I(I, I, I, b,I, s,I).
Further, we denote the PIN estimates calculated by A and I as PINA and PINI, respectively. In addition, the subscripted A and I indicate estimates from TAL and TIL , respectively.
19
We perform a simulation test with 2,500 hypothetical stocks, each assigned a random parameter vector = (, , , b, s).
{0.1, 0.3, 0.5, 0.7, 0.9} and {0.1, 0.3, 0.5, 0.7, 0.9}.
For each pair of and , we randomly generate 100 combinations of , b, and s. [0, 600] with a probability density function f() =
1/600. b [0, 1,200] with f(b) = 1/1,200. b [0, 1,200] with f(s) = 1/1,200.
The upper bounds of as well as b and s—600 and 1,200, respectively— are based on the results of Yan and Zhang (2006) for NYSE and AMEX stocks between 1993 and 2004.
Simulation Test - Sample
Simulation Tes - Fig. 1
Simulation Test - Sketch of Fig. 1In Panel A, most of the PIN estimates using are located
along the 45 degree line.
By contrast, most PIN estimates using are located to
the right of the 45 degree line and are systematically
underestimated.
Moreover, Panels D, E, and F show that the underestimation of
PIN results from the underestimation of and the
overestimation of b+s if these parameters are estimated
with .
TAL
TIL
TIL
Simulation Test - Table 1Table 1: Summary and test statistics for the simulation data
Variable Mean SEMa Min Median Max Wilcoxon Testb
PINA PIN 0.0011 0.0003 –0.1145 0.0002 0.0951 0.0064*
PINI PIN –0.0299 0.0012 –0.6123 –0.0069 0.0951 <0.0001*
A 0.0156 0.0028 –0.9000 0.0000 c 0.9000 0.0007*
I 0.0145 0.0031 –0.8578 –0.0000 0.9000 0.8372
A 0.0049 0.0033 –0.9000 0.0000 0.9000 0.3616
I 0.0052 0.0037 –0.9000 0.0000 0.9000 0.3883
A 1.3877 0.2872 –106.3085 0.1031 324.4225 0.0288
I –73.5955 2.2565 –547.3298 –9.9742 324.4225 <0.0001*
A # –1.6607 0.2362 –148.8821 –0.1057 32.0964 0.0104
I 32.0990 1.3010 –148.8823 5.7255 421.4701 <0.0001*
A/ A / 0.0009 0.0004 –0.4242 0.0001 0.1512 0.0188
I/ I / –0.1189 0.0135 –30.4464 -0.0088 0.1512 <0.0001* * Indicates significance at p < .01. a SEM denotes the standard error of mean.
b The column shows the p-value of Wilcoxon signed rank test with the null hypothesis of zero mean.
c All zeros are due to rounding off and truncation.
# bs and b s, where the extra right subscripted A or I is omitted.
Simulation Test - Sketch of Table 1With , the estimate of PIN is downward biased, but the estimates of parameters are unbiased according to the Wilcoxon signed rank test (p.01), except for .
is more precise than with its smaller standard error of the mean (SEM).
and may not be significantly different from each other with their close ranges, defined as the difference between the
maximum and the minimum.
With , PIN is statistically significantly underestimated.
Such an underestimation may result from the underestimation of and the overestimation of , where b + s.
A
PINA
TAL
TIL
PINI
I
Simulation Test - Table 2Table 2: Test of unbiasedness using the regression method for simulation data H0: 1b Model White S.E. White Testa ————— 2
E(PINA|PIN) PIN 0.9988 0.0023 12.85* 0.27
E(PIN I|PIN) PIN 0.7059 0.0109 25.42* 733.26*
E(A|) 1.0074 0.0039 14.50* 3.49
E(I|) 0.9951 0.0040 34.85* 1.49
E( A|) 0.9910 0.0055 2.70 2.69
E( I|) 0.9787 0.0060 3.21 12.54*
E(A|) 1.0001 0.0005 9.04* 0.04
E(I|) 0.6986 0.0063 156.30* 2,268.51*
E( A|) # 0.9985 0.0002 14.16* 42.56*
E( I|) 1.0195 0.0009 32.73* 430.54*
E(A/ A|/) / 0.9934 0.0035 1.54 3.57
E(I/ I|/) / 0.3332 0.1318 1.18 25.60*
* Indicates significance at p < .01. a The 2 for White test of Heteroscedasticity.
b The Wald statistic based on the White estimator for the covariance matrix. # bs and b s, where the right subscripted A or I is omitted.
Simulation Test - Sketch of Table 2Given the simple no-intercept regression model, the regression
coefficient should be one in which the estimates are unbiased.
Moreover, many of these regressions are subject to heteroscedasticity. Accordingly, the table uses the White test estimator for the variance to test H0: = 1.
At the significance level 0.01, the estimates of parameters are not significantly biased for with the exception of , while the estimates of parameters other than are biased for .
As for that the coefficient in regression, E( ) appears to be different from one, the potential explanation may be that its
standard error of 0.0002 is too small.
For , the underestimation of PIN may be due to both the underestimation of and the overestimation of b + s.
TAL TIL
TIL
A
Table 3: Regions subject to the floating-point exception bias
Regions Covering Sample #(PINA #(PIN I Number of #(Con.)b #(Con.)
True Parameter Size <PIN)a/(1) <PIN)/(1) Initial /(2) for /(2) for
Values TAL TIL
(1) % of (1) % of (1) (2) % of (2) % of (2)
00)1(TLIBFS
c 188 29.79 20.74 18,845 99.79 29.22
00)4(TLIBFS \ 00)1(TLI
BFS 547 51.37 51.19 54,280 99.72 25.90
10)7(TLIBFS \ 00)4(TLI
BFS 609 48.28 48.77 60,520 99.80 22.13
BFS \ 10)7(TLIBFS 1,156 51.12 94.64d 121,765 99.86 19.62
a Number of downward biased PIN estimates. b Number of convergences for given initial values
c
Ii
UxxM
xSxB
xS
xB
UxxM
U sbi
sibi
si
bi
sbi
..., 3, 2, 1,
,)log()log(
)log()log(
and ),log(
),log(
min
,)log( ),log(min
)(TL I
BFSBFS .
d The value is not equal to 100 because terms of and in TIL may mitigate FPE.
Simulation Test - Table 3
27
28
Simulation Test - Sketch of Table 3 (1/2)We classify the 2,500 hypothetical stocks to different groups according to .
For a hypothetical stock, if its true parameter is in with a small U < E710, then its PIN estimate may not be
subject to bias.
94.64% of are underestimated when true parameters are in BFS\ BFS\ , in which the FPE
occurs with a high frequency.
With , the percentage of convergence increases from 19.92% to 29.22% for different given initial values and when actual parameters are distant from BFS\ .
)(TLIUBFS
)(TLIUBFS
PINI
10)7(TLIBFS
)(TL IEBFS10)7(TLI
BFS
TIL
29
Simulation Test - Sketch of Table 3 (2/2)By contrast, is not significantly underestimated, and almost all iterations converge for the optimization of in
different regions.
Namely, the bias of is primarily caused by FPE, and when convergences of MLE are sensitive to initial values, the PIN
estimate tends to be downward biased.
Moreover, small percentages of underestimated and with in are due to most observations with the true being too small relative to b and s.
00)1(TLIBFS
PINI PINA
PINI
PINA TAL
To explore the validity of the PIN estimate in predicting the opening spread, Easley et al. (1996) propose the regression model:
0 + 1VPIN + 2VOL + .
is the opening spread (TAQ). is calculated as the mean daily opening spread for each
stock.
V is the stock price (CRSP). V is calculated as the mean daily closing price.
VOL denotes the mean daily dollar volume (CRSP). VOL is the mean daily number of shares traded multiplied by V.
Empirical Evidences -Regression Test
30
Our sample is constructed using the NYSE’s publicly available TAQ and CRSP. The initial procedure is selection of 1,317 NYSE-listed stocks during the fourth quarter of 2007. First, we select the common stocks with IPO dates before
October 1, 2004, that are traded every day during the sample period.
Next, we eliminate infrequently traded stocks. mean daily volumes below $20,000, fewer than 20 mean daily trades, prices below $3 in any trade during the sample period.
The large number of buys and sells still leads to failure in terms of obtaining in 2007. Therefore, we focus on the 1,056 stocks for which both and
can be obtained.PINA
PINI
PINI
Empirical Evidences - Sample (1/2)
31
To estimate PIN, we first count the daily numbers of buys and sells for each stock.
Following Easley et al. (1996), Easley et al. (1998), Easley et al. (2002), as well as Boehmer, Grammig, and Theissen (2007), we adopt the Lee and Ready (1991) algorithm to infer trade direction during regular trading hours (9:30am ~ 4:00pm).
Use the first quote during the regular trading hours as the opening quote. Then, calculate the mean daily opening spread as for each stock.
Finally, retrieve from CRSP both V and VOL.
Empirical Evidences - Sample (2/2)
32
Lee-Ready Method: Quote Method Tick Method Quote Method: A trade is buys if its price is great than the spread midpoint. A trade is sells if its price is less than the spread midpoint. A trade is unclassified if its price is at the spread midpoint. Tick Method: A trade is buys if its price increases relative to previous prices. A trade is sells if its price decreases relative to previous prices.
Empirical Evidences - Lee & Ready
Bid
Ask
Midpoint
Buys
Sells
A. The quote method
Bid
Ask
Midpoint
B
S
S
B
B
B. The tick method
33
Empirical Evidences - Fig. 2
34
Empirical Evidences -Sketch of Fig. 2
35
Panel A shows that most dots are located to the right of the 45
line, suggesting that most values of are underestimated
relative to those of .
Panels B through F show that the underestimation emerges
when most values of are overestimated relative to those of
, and most values of are underestimated relative to
those of .
Specifically, the values of are bounded by 300, but the
values of are spread in a wide range from zero to 2,100.
Moreover, and are not significantly different from and
in Panels C and E.
PINI
PINA
A I
I
A
I A
A A I I
36
Empirical Evidences - Table 4Table 4: Summary and comparison of estimates for 1,056 NYSE-listed stocks from TAQ
Estimate Mean Min Median Max
PINA 0.1396 0.0468 0.1341 0.3398
PINI 0.0949 0.0286 0.0786 0.2991
PINbiasPINI PINA –0.0447 –0.2342 –0.0426 0.0394
Relative PINbiasPINbias/PINA –0.3179 –0.8302 –0.3523 0.4665
Percentage of PINI
under-estimated relative to PINA 86.17% — — —
A 0.3638 0.0312 0.3594 0.7656
I 0.4898 0.0313 0.4660 0.8974
A 0.5299 0.0000a 0.5529 1.0000
I 0.4723 0.0000 0.4535 1.0000
A 472.2234 18.6807 425.0530 2,097.9190
I 182.6015 18.6807 188.9032 309.1559
A 1,125.9189 18.8944 928.7753 3,888.0867
I 1,198.8276 18.8944 979.2572 4,109.5795
A/ Ab 0.5028 0.2155 0.4327 6.3139
I/ I 0.2740 0.0602 0.1904 2.3425
a The zero is due to rounding off and truncation.
b bs and b s, where the right subscripted A or I is omitted.
37
Empirical Evidences -Sketch of Table 4This table shows that 86.17% of are underestimated relative to , and the means of and relative are –0.0447 and –0.3179, respectively.
Moreover, the values of scatter around their median of 425.0530 over a wide range from approximately zero to 2,100.
By contrast, the median of is 188.9032 and the values of are bounded by approximately 310.
The result is consistent with the notion that the underestimat-ion of results from the underestimation of .
I
PINI
PINbias PINI PINbias PINA
I
PINI
I
PINbias PINA
A
PINA
38
Empirical Evidences - Fig. 3
-0.24
-0.20
-0.16
-0.12
-0.08
-0.04
0.00
0.04
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000T max
PINbias
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
maxT
A. PINbias vs. Tmax B. maxTPIN I and maxTPIN I vs. Tmax
PIN = -0.0399ln(Trade ) + 0.3672
R2 = 0.6566
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0 1 2 3 4 5 6 7 8 9
ln(Trade )
PINI PIN = -0.0149ln(Trade ) + 0.2414
R2 = 0.1487
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0 1 2 3 4 5 6 7 8 9
ln(Trade )
PINA
C. PINI vs. ln(Trade) D. PINA vs. ln(Trade)
Figure 3: PIN estimates with TAL and TIL against the daily number of trades
Empirical Evidences -Sketch of Fig. 3 (1/2)Panel A shows that the absolute value of
increases with the maximum daily number of trades, denoted by Tmax, for the 2007 sample.
When Tmax is less than approximately 1,000, the difference between both estimates is negligible.
Namely, the deviations of PIN estimates from the actual measure may emerge for firms with frequent trades.
When b s, we derive the upper bound of PIN estimates with a given Tmax, , for the original . We also derive
for the reformulated proposed by Easley et al. (2001) when b s .
maxTPIN I
maxTPIN I TIL
TIL
PINbias PINI PINA
39
Empirical Evidences -Sketch of Fig. 3 (2/2)In Panel B, if the empirical upper bound is 0.4, the FPE bias may
be significant for stocks with Tmax greater than 970 or 3,350, depending on the adopted likelihood function.
Panels C and D show that the R2 between and the natural log of mean daily number of trades, denoted as ln(Trade), is
0.6566 but between and ln(Trade), it is 0.1487.
In sum, Panels A through D show that FPE bias may overstate the correlation between the PIN estimates and trading-frequency sensitive measures, such as the daily number of trades.
This result implies that FPE is not merely random but rather is systemic with large numbers of trades.
PINI
PINA
40
41
Empirical Evidences - Table 5Table 5: Regression test for the influence of floating-point exception bias
Model with General Model Restriction to 2 = 0 Marginal Effect
Adopted ————————— ————————— —————————
Coeff. PIN PINA PINI PINA PINI PINA PINI
Intercept 0.9766 0.9367 0.9772 1.1448 0.9002 0.9002
(28.9437*) (24.0413*) (28.9526*) (27.5519*) (25.1248*) (25.1248*)
V*PIN 0.2162 0.2712 0.2217 0.2995 0.2551 0.2551
(30.3282*) (25.6081*) (37.8259*) (25.4660*) (26.0859*) (26.0859*)
V*PINbias 0.0915 –0.1637
(5.7156*) (–14.1339*)
VOL 8.84E–10 9.95E–09 2.86E–09 2.86E–09
(1.3307*) (16.7623*) (3.8644*) (3.8644*)
Adj. R2 0.5757 0.5102 0.5753 0.3801 0.5880 0.5880
F-Value 716.5800* 550.5300* 1,430.3500* 647.9600* 502.9800* 502.9800*
* Indicates significance at p < .01.
0 + 1VPIN 2VOL +
42
Empirical Evidences -Sketch of Table 5 (1/2)The regression model: 0 + 1VPIN 2VOL + , which is proposed by Easley et al. (1996) for exploring their PIN model.
The PIN in the regression is replaced by or , alternately.
The regressions use ordinary least squares, and the t-statistics are reported in parentheses.
The table shows that outperforms in explaining the spread because it leads to a greater adjusted R2 in both the general model and the model restricted to 2 = 0.
In particular, improves the adjusted R2 from 0.3801 to 0.5753 for the model restricted to 20.
PINI PINA
PINI PINA
PINA
43
Empirical Evidences -Sketch of Table 5 (2/2)The marginal effect model: 0 + 1VPIN1,biasV
+
2VOL + , where is equal to minus .
With the marginal effect model, this study demonstrates that
provides incremental explanatory power for spread
regardless of controlling for or .
The explanation is that and are substantially different measures for the probability of informed trading.
PINbias
PINA
PINbias
PINI PINbias PINA
PINI
PINA PINI
Discussion and Conclusion (1/2) This study posits and explores the bias in PIN
estimation due to FPE.
Results based on both simulation and historical data show that such a bias may be more pronounced for active stocks than for inactive stocks.
This bias may have contaminated the results of prior studies, as failing to identify such a bias may lead to overstatements of the relation between PIN and measures sensitive to trading frequency.
44
Discussion and Conclusion (2/2) This study not only demonstrates the significance
of FPE bias but also presents measures for mitigating such bias.
Our findings remind users of packages(e.g. SAS, and MATLAB) of the significance of inherent bias and provide them with a simple remedy.
Our results indicate that the underestimation of PIN is primarily caused by a significant downward bias regarding the arrival rate of informed trades from an inaccurate likelihood expression, LI(|T), which leads to FPE.
45
Thanks for your listening!!