fpe 90min-all

1

A Computing Bias in Estimating theProbability of

Informed TradingHsiou-Wei W. Lin, Wen-Chyan

KeJournal of Financial Markets,

forthcoming

2

Introduction(1/5)

Since Easley et al. (1996; 2002), who proposed a structural model to estimate the probability of informed trading (PIN) with MLE, the PIN has been widely employed in market microstructure studies.

To obtain PIN using numerical MLE, researchers need to count the daily numbers of buyer- and seller-initiated trades (or buys and sells) for a stock.

Large buys and sells may trigger the power function embedded in the likelihood to generate a numerical value that exceeds the range of real number values handled by the a computer program.

Such a phenomenon of overflow or under flow is referred to in computing science as the floating-point exception (FPE)

Introduction(2/5)

4

Introduction(3/5)

The floating point is the most common representation today for real numbers by digital computers.

Its effective range is approximately ±10308.25.

For example, computing exp(710), which exceeds 10308.25, results in an overflow (i.e., an FPE).

5

Introduction(4/5) FPE narrows the set of feasible solutions for

the optimization problem of MLE and may, in fact, eliminate estimates for the actual parameters.

FPE appears to have had more impact on PIN estimates as trading has become more frequent in recent year. Easley et al. (2005) do not have PIN estimates for 3.6%

stocks in their 2001 sample. Yang and Zhang (2006) report that 3.8 % stocks lack PIN

estimates in the their 2004 sample.

FPE bias may also overstate the relation between the PIN estimates and trading frequency (e.g. the daily number of trades).

We propose a remedial approach, reformulating the likelihood function.

Introduction(5/5)

7Figrue: Tree diagram of the trading process (Easley et al., 2002).

Buy Arrival Rate:b

Sell Arrival Rate:s+

Buy Arrival Rate:b+

Sell Arrival Rate:s

Buy Arrival Rate:b

Sell Arrival Rate:s

Signaling Good News

with Probability (1-)

Signaling Bad News

with Probability

Information Event Occurs

with Probability

Information Event Does Not Occur

with Probability (1-)

8

PIN Estimation(1/2)

On day i, the joint probability density function of (Bi, Si),

the observed numbers of buys and sells, is specified by

f(Bi, Si|(, , , b, s))

exp( b)

bBi

Bi! exp( (s + ))

(s + )Si

Si!

(1 ) exp( (b + ))

(b + )Bi

Bi! exp( s)

sSi

Si!

(1 ) exp( b)

bBi

Bi! exp( s)

sSi

Si!.

9

PIN Estimation(2/2) The estimate of from MLE, denoted as , , , b, s),

is obtained by solving the following problem:

TBFS

BL Maximize

, (1)

where TBL

I

i

iiB S,BL1

I

iii S,Bf

1

log ;

T((B1, S1), (B2, S2), (B3, S3),…, (BI, SI));

and BFS 0, , and 10 , , , sbsb ,,,, ,

the boundary constraint.

The PIN estimate is PIN b s

, which is the ratio of

mean informed trade to mean total trade.

10

Bias and Remedy(1/8) Given T and TBL , there exists a subset of BFS:

TLBBFS FPE the toleadnot does TBFS BL ,

where TBL can be accurately expressed as a floating-point number

in the computing process.

The boundary of TLBBFS is varietal with respect to T and TBL .

If we obtain a (local) solution to problem (1) via the numerical method

with FPE, we are, in practice, solving the following problem (2):

TBFS

BL MaximizeTBL

. (2)

11

Bias and Remedy(2/8) In the presence of FPE, problem (2) deviates from problem (1). Depending on T

and TBL , the relation between and LBT—the solutions to (1) and (2),

respectively—is complicated. Larger Bi and Si, which are likely to be observed

during periods of more frequent trading, correspond to a smaller TLBBFS and

thus a more pronounced difference between and LBT.

If there is no solution to problem (2), FPE causes a selection bias as in prior

studies (e.g., Easley et al., 2002, 2005; Yan and Zhang, 2006).

If a numerical solution LBT to problem (2) is obtained, LBT is in TLBBFS

but deviates from , which is the actual parameter vector and may be in

BFS\ TLBBFS . Such a situation leads to a bias in the estimate of and thus a

downward bias in PIN estimate.

Bias and Remedy(3/10) － A simple example Let b s and rewrite iiB S,BL as follows:

ii S,BL log exp )1log(exp iS

+(1 ) exp )1log(exp iB

+ )1( +(Bi+Si)log() 2 log(Si!Bi!). (S1)

With TL

I

i

ii S,BL1

, we derive:

TLBFS FPE the toleadnot does TBFS L

EB,S

EB,S

maxmax

maxmax

)1)log(max(

and ,)1)log(max(

BFS

710)1)log(max( EB,S maxmax BFS , (S2)

where Smaxmax{Si,i1,2,3,…,I}, Bmaxmax{Bi,i1,2,3,…,I}, and

Emin {e0| Input value e leads to overflows for exponential function exp()}.

Si = exp(Silog())

Bias and Remedy(4/10) － A simple example Rewrite ii S,BL as ii S,BL

~ :

ii S,BL~ log )1log(exp iS +(1 ) )1log(exp iB

+ )1( +(Bi+Si)log() 2 log(Si!Bi!). (S3)

Let TL~

I

i

ii S,BL~

1

. Then we obtain TL~BFS analogous to TLBFS :

TL~BFS FPE the toleadnot does TBFS L

~

E)B,S maxmax )1log(max( BFS . (S4)

TLBFS EB,S maxmax )1)log(max( BFS

exp(-)exp(Silog(1+ /)) exp(-)exp(Silog(1+ /))

Bias and Remedy(5/10) － A simple example Assume that the unbiased estimate BFS\ TLBFS is the solution to (1) and that L T is the

solution to (S5) and thus naturally in TLBFS .

TBFS

L MaximizeTL

. (S5)

Assumes that the estimates of and are the same in and L T because the estimation of the two

parameters is not the main factor leading to the overflow given bounds of zero and one.

With (, , , , )BFS\ TLBFS and (L T, L T, L T, L T, L T) TLBFS , we obtain:

log(1+L T/ L T) )max( maxmax B,SE log(1+/ ).

Thus,

PINL T1

1+2 L T/(L TL T) 1

1+2 /()PIN

because L T with the assumptions as stated.

TLBFS EB,S maxmax )1)log(max( BFS )max()1log( maxmax B,S/E BFS

15

Bias and Remedy(7/10) TIL

I

i

iiI S,BL1

, which is proposed by Easley et al. (2005); iiI S,BL is obtained by

reformulating iiB S,BL as

iiI S,BL iiiiiiiiii MSs

MBb

MSs

Mb

Ms

MBb xxxxxx )1()exp()1()exp(log

Bilog(b)Silog(s) (bs)Mi (log(xb)log(xs)) log(Si!Bi!),

where Mi min(Bi, Si)max(Bi, Si)/2, b

bbx

, and s

ssx

.

Construct ETLIBFS FPE the toleadnot does TBFS IL analogous to TLB

BFS :

ETLIBFS

Ii

ExxM

xSxB

xS

xB

ExxM

sbi

sibi

si

bi

sbi

..., 3, 2, 1,

,)log()log(

)log()log(

and ),log(

),log(

min

,)log( ),log(min

BFS ,

where E > 0 denotes the minimum of numbers leading to the overflow for the exponential function

exp() in the computing process.

ETLIBFS

Ii

ExxM

xSxB

xS

xB

ExxM

sbi

sibi

si

bi

sbi

..., 3, 2, 1,

,)log()log(

)log()log(

and ),log(

),log(

min

,)log( ),log(min

BFS

16

Bias and Remedy(8/10) Whith the following L

TLIBFS and U

TLIBFS , we could argue that TIL results in

downward-biased PIN estimates for stocks with a large number of trades.

ETLIBFS L

TLIBFS ExxM sb )log()log(maxBFS and

ETLIBFS U

TLIBFS ExxM sb ))log( ,)log(max(maxBFS ,

where Mmaxmax{Mi, i1, 2, 3, … , I}.

This item is less than zero.

dominate

Mmax log(xb) and log(xs) 0 xb=b/(+b) and xs=s/(+s) 1 0 PIN =1/(1+ /(b+ s)) is underestimated.

17

Bias and Remedy(9/10) To reduce the FPE bias, we first present the accurate likelihood expression TAL

I

i

iiA S,BL1

, where iiA S,BL is as follows:

iiA S,BL log exp(e1i emaxi) (1 )exp(e2i emaxi) (1 )exp(e3i emaxi )

Bilog(b ) Silog(s ) (b s) emaxi log(Si!Bi!),

where e1i Bilog(1/b), e2i Silog(1/s), e3i Bilog(1/b) Si

log(1/s), and emaximax(e1i,e2i,e3i).

Construct ETLABFS analogous to TLB

BFS as follows:

ETLABFS FPE the toleadnot does TBFS AL

Ii,E)ee,ee,ee imaxiimaxiimaxi ..., 3, 2, 1, max( 321 BFS BFS,

where e1i emaxi, e2i emaxi and e3i emaxi are always negative, and are therefore less than

E0 for each BFS.

The value in the log function is great than 1.

The value is less than 0.

exp(x)bycz = exp(x+ylog(b)+zlog(c))

18

Bias and Remedy(10/10) Based on TBL , TAL is derived from the following two

principles:

In computing exey (or xey), the expression of exy (or

sign(x)elog(|x|)y) is more stable than that of exey (or xey). For

example, exeye800e 400 and ex+ye400 when x800,

y 400.

In computer arithmetic, the absolute computing error of a

function f(x) increases with |f'(x)|. Therefore, a large input

value for exp() and, conversely, a small positive input value

for log() should be avoided. Namely, log(e(xy) m ez m)m

with mmax(xy, z) is more accurate than log(ex+yez) in

computing. For example, x800, y 400, z00.

Notaions In the following sections, we compare the two different

likelihood expressions TIL and TAL based on both simulation and historical data.

To distinguish between estimates with TIL and TAL , we denote the estimate using TAL as A

(A , A , A , b,A , s,A ) and the estimate using TIL as I(I, I, I, b,I, s,I).

Further, we denote the PIN estimates calculated by A and I as PINA and PINI, respectively. In addition, the subscripted A and I indicate estimates from TAL and TIL , respectively.

19

We perform a simulation test with 2,500 hypothetical stocks, each assigned a random parameter vector = (, , , b, s).

{0.1, 0.3, 0.5, 0.7, 0.9} and {0.1, 0.3, 0.5, 0.7, 0.9}.

For each pair of and , we randomly generate 100 combinations of , b, and s. [0, 600] with a probability density function f() =

1/600. b [0, 1,200] with f(b) = 1/1,200. b [0, 1,200] with f(s) = 1/1,200.

The upper bounds of as well as b and s—600 and 1,200, respectively— are based on the results of Yan and Zhang (2006) for NYSE and AMEX stocks between 1993 and 2004.

Simulation Test － Sample

Simulation Tes － Fig. 1

Simulation Test － Sketch of Fig. 1In Panel A, most of the PIN estimates using are located

along the 45 degree line.

By contrast, most PIN estimates using are located to

the right of the 45 degree line and are systematically

underestimated.

Moreover, Panels D, E, and F show that the underestimation of

PIN results from the underestimation of and the

overestimation of b+s if these parameters are estimated

with .

TAL

TIL

TIL

Simulation Test － Table 1Table 1: Summary and test statistics for the simulation data

Variable Mean SEMa Min Median Max Wilcoxon Testb

PINA PIN 0.0011 0.0003 –0.1145 0.0002 0.0951 0.0064*

PINI PIN –0.0299 0.0012 –0.6123 –0.0069 0.0951 <0.0001*

A 0.0156 0.0028 –0.9000 0.0000 c 0.9000 0.0007*

I 0.0145 0.0031 –0.8578 –0.0000 0.9000 0.8372

A 0.0049 0.0033 –0.9000 0.0000 0.9000 0.3616

I 0.0052 0.0037 –0.9000 0.0000 0.9000 0.3883

A 1.3877 0.2872 –106.3085 0.1031 324.4225 0.0288

I –73.5955 2.2565 –547.3298 –9.9742 324.4225 <0.0001*

A # –1.6607 0.2362 –148.8821 –0.1057 32.0964 0.0104

I 32.0990 1.3010 –148.8823 5.7255 421.4701 <0.0001*

A/ A / 0.0009 0.0004 –0.4242 0.0001 0.1512 0.0188

I/ I / –0.1189 0.0135 –30.4464 -0.0088 0.1512 <0.0001* * Indicates significance at p < .01. a SEM denotes the standard error of mean.

b The column shows the p-value of Wilcoxon signed rank test with the null hypothesis of zero mean.

c All zeros are due to rounding off and truncation.

# bs and b s, where the extra right subscripted A or I is omitted.

Simulation Test － Sketch of Table 1With , the estimate of PIN is downward biased, but the estimates of parameters are unbiased according to the Wilcoxon signed rank test (p.01), except for .

is more precise than with its smaller standard error of the mean (SEM).

and may not be significantly different from each other with their close ranges, defined as the difference between the

maximum and the minimum.

With , PIN is statistically significantly underestimated.

Such an underestimation may result from the underestimation of and the overestimation of , where b + s.

A

PINA

TAL

TIL

PINI

I

Simulation Test － Table 2Table 2: Test of unbiasedness using the regression method for simulation data H0: 1b Model White S.E. White Testa ————— 2

E(PINA|PIN) PIN 0.9988 0.0023 12.85* 0.27

E(PIN I|PIN) PIN 0.7059 0.0109 25.42* 733.26*

E(A|) 1.0074 0.0039 14.50* 3.49

E(I|) 0.9951 0.0040 34.85* 1.49

E( A|) 0.9910 0.0055 2.70 2.69

E( I|) 0.9787 0.0060 3.21 12.54*

E(A|) 1.0001 0.0005 9.04* 0.04

E(I|) 0.6986 0.0063 156.30* 2,268.51*

E( A|) # 0.9985 0.0002 14.16* 42.56*

E( I|) 1.0195 0.0009 32.73* 430.54*

E(A/ A|/) / 0.9934 0.0035 1.54 3.57

E(I/ I|/) / 0.3332 0.1318 1.18 25.60*

* Indicates significance at p < .01. a The 2 for White test of Heteroscedasticity.

b The Wald statistic based on the White estimator for the covariance matrix. # bs and b s, where the right subscripted A or I is omitted.

Simulation Test － Sketch of Table 2Given the simple no-intercept regression model, the regression

coefficient should be one in which the estimates are unbiased.

Moreover, many of these regressions are subject to heteroscedasticity. Accordingly, the table uses the White test estimator for the variance to test H0: = 1.

At the significance level 0.01, the estimates of parameters are not significantly biased for with the exception of , while the estimates of parameters other than are biased for .

As for that the coefficient in regression, E( ) appears to be different from one, the potential explanation may be that its

standard error of 0.0002 is too small.

For , the underestimation of PIN may be due to both the underestimation of and the overestimation of b + s.

TAL TIL

TIL

A

Table 3: Regions subject to the floating-point exception bias

Regions Covering Sample #(PINA #(PIN I Number of #(Con.)b #(Con.)

True Parameter Size <PIN)a/(1) <PIN)/(1) Initial /(2) for /(2) for

Values TAL TIL

(1) % of (1) % of (1) (2) % of (2) % of (2)

00)1(TLIBFS

c 188 29.79 20.74 18,845 99.79 29.22

00)4(TLIBFS \ 00)1(TLI

BFS 547 51.37 51.19 54,280 99.72 25.90

10)7(TLIBFS \ 00)4(TLI

BFS 609 48.28 48.77 60,520 99.80 22.13

BFS \ 10)7(TLIBFS 1,156 51.12 94.64d 121,765 99.86 19.62

a Number of downward biased PIN estimates. b Number of convergences for given initial values

c

Ii

UxxM

xSxB

xS

xB

UxxM

U sbi

sibi

si

bi

sbi

..., 3, 2, 1,

,)log()log(

)log()log(

and ),log(

),log(

min

,)log( ),log(min

)(TL I

BFSBFS .

d The value is not equal to 100 because terms of and in TIL may mitigate FPE.

Simulation Test － Table 3

27

28

Simulation Test － Sketch of Table 3 (1/2)We classify the 2,500 hypothetical stocks to different groups according to .

For a hypothetical stock, if its true parameter is in with a small U < E710, then its PIN estimate may not be

subject to bias.

94.64% of are underestimated when true parameters are in BFS\ BFS\ , in which the FPE

occurs with a high frequency.

With , the percentage of convergence increases from 19.92% to 29.22% for different given initial values and when actual parameters are distant from BFS\ .

)(TLIUBFS

)(TLIUBFS

PINI

10)7(TLIBFS

)(TL IEBFS10)7(TLI

BFS

TIL

29

Simulation Test － Sketch of Table 3 (2/2)By contrast, is not significantly underestimated, and almost all iterations converge for the optimization of in

different regions.

Namely, the bias of is primarily caused by FPE, and when convergences of MLE are sensitive to initial values, the PIN

estimate tends to be downward biased.

Moreover, small percentages of underestimated and with in are due to most observations with the true being too small relative to b and s.

00)1(TLIBFS

PINI PINA

PINI

PINA TAL

To explore the validity of the PIN estimate in predicting the opening spread, Easley et al. (1996) propose the regression model:

0 + 1VPIN + 2VOL + .

is the opening spread (TAQ). is calculated as the mean daily opening spread for each

stock.

V is the stock price (CRSP). V is calculated as the mean daily closing price.

VOL denotes the mean daily dollar volume (CRSP). VOL is the mean daily number of shares traded multiplied by V.

Empirical Evidences －Regression Test

30

Our sample is constructed using the NYSE’s publicly available TAQ and CRSP. The initial procedure is selection of 1,317 NYSE-listed stocks during the fourth quarter of 2007. First, we select the common stocks with IPO dates before

October 1, 2004, that are traded every day during the sample period.

Next, we eliminate infrequently traded stocks. mean daily volumes below $20,000, fewer than 20 mean daily trades, prices below $3 in any trade during the sample period.

The large number of buys and sells still leads to failure in terms of obtaining in 2007. Therefore, we focus on the 1,056 stocks for which both and

can be obtained.PINA

PINI

PINI

Empirical Evidences － Sample (1/2)

31

To estimate PIN, we first count the daily numbers of buys and sells for each stock.

Following Easley et al. (1996), Easley et al. (1998), Easley et al. (2002), as well as Boehmer, Grammig, and Theissen (2007), we adopt the Lee and Ready (1991) algorithm to infer trade direction during regular trading hours (9:30am ~ 4:00pm).

Use the first quote during the regular trading hours as the opening quote. Then, calculate the mean daily opening spread as for each stock.

Finally, retrieve from CRSP both V and VOL.

Empirical Evidences － Sample (2/2)

32

Lee-Ready Method: Quote Method Tick Method Quote Method: A trade is buys if its price is great than the spread midpoint. A trade is sells if its price is less than the spread midpoint. A trade is unclassified if its price is at the spread midpoint. Tick Method: A trade is buys if its price increases relative to previous prices. A trade is sells if its price decreases relative to previous prices.

Empirical Evidences － Lee & Ready

Bid

Ask

Midpoint

Buys

Sells

A. The quote method

Bid

Ask

Midpoint

B

S

S

B

B

B. The tick method

33

Empirical Evidences － Fig. 2

34

Empirical Evidences －Sketch of Fig. 2

35

Panel A shows that most dots are located to the right of the 45

line, suggesting that most values of are underestimated

relative to those of .

Panels B through F show that the underestimation emerges

when most values of are overestimated relative to those of

, and most values of are underestimated relative to

those of .

Specifically, the values of are bounded by 300, but the

values of are spread in a wide range from zero to 2,100.

Moreover, and are not significantly different from and

in Panels C and E.

PINI

PINA

A I

I

A

I A

A A I I

36

Empirical Evidences － Table 4Table 4: Summary and comparison of estimates for 1,056 NYSE-listed stocks from TAQ

Estimate Mean Min Median Max

PINA 0.1396 0.0468 0.1341 0.3398

PINI 0.0949 0.0286 0.0786 0.2991

PINbiasPINI PINA –0.0447 –0.2342 –0.0426 0.0394

Relative PINbiasPINbias/PINA –0.3179 –0.8302 –0.3523 0.4665

Percentage of PINI

under-estimated relative to PINA 86.17% — — —

A 0.3638 0.0312 0.3594 0.7656

I 0.4898 0.0313 0.4660 0.8974

A 0.5299 0.0000a 0.5529 1.0000

I 0.4723 0.0000 0.4535 1.0000

A 472.2234 18.6807 425.0530 2,097.9190

I 182.6015 18.6807 188.9032 309.1559

A 1,125.9189 18.8944 928.7753 3,888.0867

I 1,198.8276 18.8944 979.2572 4,109.5795

A/ Ab 0.5028 0.2155 0.4327 6.3139

I/ I 0.2740 0.0602 0.1904 2.3425

a The zero is due to rounding off and truncation.

b bs and b s, where the right subscripted A or I is omitted.

37

Empirical Evidences －Sketch of Table 4This table shows that 86.17% of are underestimated relative to , and the means of and relative are –0.0447 and –0.3179, respectively.

Moreover, the values of scatter around their median of 425.0530 over a wide range from approximately zero to 2,100.

By contrast, the median of is 188.9032 and the values of are bounded by approximately 310.

The result is consistent with the notion that the underestimat-ion of results from the underestimation of .

I

PINI

PINbias PINI PINbias PINA

I

PINI

I

PINbias PINA

A

PINA

38

Empirical Evidences － Fig. 3

-0.24

-0.20

-0.16

-0.12

-0.08

-0.04

0.00

0.04

0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000T max

PINbias

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000

maxT

A. PINbias vs. Tmax B. maxTPIN I and maxTPIN I vs. Tmax

PIN = -0.0399ln(Trade ) + 0.3672

R2 = 0.6566

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0 1 2 3 4 5 6 7 8 9

ln(Trade )

PINI PIN = -0.0149ln(Trade ) + 0.2414

R2 = 0.1487

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0 1 2 3 4 5 6 7 8 9

ln(Trade )

PINA

C. PINI vs. ln(Trade) D. PINA vs. ln(Trade)

Figure 3: PIN estimates with TAL and TIL against the daily number of trades

Empirical Evidences －Sketch of Fig. 3 (1/2)Panel A shows that the absolute value of

increases with the maximum daily number of trades, denoted by Tmax, for the 2007 sample.

When Tmax is less than approximately 1,000, the difference between both estimates is negligible.

Namely, the deviations of PIN estimates from the actual measure may emerge for firms with frequent trades.

When b s, we derive the upper bound of PIN estimates with a given Tmax, , for the original . We also derive

for the reformulated proposed by Easley et al. (2001) when b s .

maxTPIN I

maxTPIN I TIL

TIL

PINbias PINI PINA

39

Empirical Evidences －Sketch of Fig. 3 (2/2)In Panel B, if the empirical upper bound is 0.4, the FPE bias may

be significant for stocks with Tmax greater than 970 or 3,350, depending on the adopted likelihood function.

Panels C and D show that the R2 between and the natural log of mean daily number of trades, denoted as ln(Trade), is

0.6566 but between and ln(Trade), it is 0.1487.

In sum, Panels A through D show that FPE bias may overstate the correlation between the PIN estimates and trading-frequency sensitive measures, such as the daily number of trades.

This result implies that FPE is not merely random but rather is systemic with large numbers of trades.

PINI

PINA

40

41

Empirical Evidences － Table 5Table 5: Regression test for the influence of floating-point exception bias

Model with General Model Restriction to 2 = 0 Marginal Effect

Adopted ————————— ————————— —————————

Coeff. PIN PINA PINI PINA PINI PINA PINI

Intercept 0.9766 0.9367 0.9772 1.1448 0.9002 0.9002

(28.9437*) (24.0413*) (28.9526*) (27.5519*) (25.1248*) (25.1248*)

V*PIN 0.2162 0.2712 0.2217 0.2995 0.2551 0.2551

(30.3282*) (25.6081*) (37.8259*) (25.4660*) (26.0859*) (26.0859*)

V*PINbias 0.0915 –0.1637

(5.7156*) (–14.1339*)

VOL 8.84E–10 9.95E–09 2.86E–09 2.86E–09

(1.3307*) (16.7623*) (3.8644*) (3.8644*)

Adj. R2 0.5757 0.5102 0.5753 0.3801 0.5880 0.5880

F-Value 716.5800* 550.5300* 1,430.3500* 647.9600* 502.9800* 502.9800*

* Indicates significance at p < .01.

0 + 1VPIN 2VOL +

42

Empirical Evidences －Sketch of Table 5 (1/2)The regression model: 0 + 1VPIN 2VOL + , which is proposed by Easley et al. (1996) for exploring their PIN model.

The PIN in the regression is replaced by or , alternately.

The regressions use ordinary least squares, and the t-statistics are reported in parentheses.

The table shows that outperforms in explaining the spread because it leads to a greater adjusted R2 in both the general model and the model restricted to 2 = 0.

In particular, improves the adjusted R2 from 0.3801 to 0.5753 for the model restricted to 20.

PINI PINA

PINI PINA

PINA

43

Empirical Evidences －Sketch of Table 5 (2/2)The marginal effect model: 0 + 1VPIN1,biasV

+

2VOL + , where is equal to minus .

With the marginal effect model, this study demonstrates that

provides incremental explanatory power for spread

regardless of controlling for or .

The explanation is that and are substantially different measures for the probability of informed trading.

PINbias

PINA

PINbias

PINI PINbias PINA

PINI

PINA PINI

Discussion and Conclusion (1/2) This study posits and explores the bias in PIN

estimation due to FPE.

Results based on both simulation and historical data show that such a bias may be more pronounced for active stocks than for inactive stocks.

This bias may have contaminated the results of prior studies, as failing to identify such a bias may lead to overstatements of the relation between PIN and measures sensitive to trading frequency.

44

Discussion and Conclusion (2/2) This study not only demonstrates the significance

of FPE bias but also presents measures for mitigating such bias.

Our findings remind users of packages(e.g. SAS, and MATLAB) of the significance of inherent bias and provide them with a simple remedy.

Our results indicate that the underestimation of PIN is primarily caused by a significant downward bias regarding the arrival rate of informed trades from an inaccurate likelihood expression, LI(|T), which leads to FPE.

45

Thanks for your listening!!

fpe 90min-all

Documents

b andx s

withf b

fpe bias

log x s

withf s

y log b

band s

stocks lack pin estimates