estimating trade elasticities using product-level data · illustration, we provide some examples of...
TRANSCRIPT
Estimating Trade Elasticities Using Product-Level Data
(Very preliminary, Please do not cite)
Juyoung Cheong⇤, Do Won Kwak†, and Kam Ki Tang ‡
Abstract
This paper estimates trade cost (tariff) elasticities using bilateral tariff data at the HS6-digit
level for 47 countries from 2002 to 2010. It extends the Helpman et al. (2008) model to incor-
porate firms’ fixed costs of exporting that vary at the pair-product level. It uses a multi-way
error component model framework to account for high-dimensional unobserved factors at the
most disaggregate level (5,310 products) including firm heterogeneity and product level mul-
tilateral resistance. Furthermore, a novel use of the within estimator is introduced because
conventional within estimator in a multi-way error component model is inconsistent with un-
balanced data. The empirical results show that there is substantial upward bias in the estimates
of trade cost elasticities in the literature. Properly accounting for zero trade flows and firm
heterogeneity using more disaggregated data provide significantly smaller estimates of trade
cost elasticities, which imply much larger welfare gains from trade.
JEL Code: C23, F14
Keywords: Trade Cost Elasticity, Tariff, Unobserved Firm Heterogeneity, Within Estimator
⇤School of Economics, University of Queensland, QLD 4072, Australia; e-mail: [email protected]†School of Economics, University of Queensland,QLD 4072, Australia; e-mail: [email protected]‡School of Economics, University of Queensland, QLD 4072, Australia; e-mail: [email protected]
1
1 Introduction
Trade elasticities are key parameters to quantify trade costs and welfare gains from trade. Arkolakis
et al. (2012) show that for a range of trade models, gains from trade can be consistently estimated
using two variables, the import-penetration ratio (or domestic share) and the trade elasticity with
respect to variable trade cost obtained from a gravity equation.1 The import-penetration ratio data
are regularly compiled by national statistical agencies and readily available across countries. On
the contrary, the trade elasticity is not directly observable and, thus, needs to be estimated. The
precise estimation of trade elasticities is difficult however, because of data limitations. In this
paper, we approximate the trade elasticities using the responsiveness of imports to tariffs changes.
We focus on identifying trade elasticities from the changes in tariffs because tariffs suffer less from
measurement errors than other variable trade cost elements like transportation cost.2
Many studies have attempted to empirically estimate trade elasticities (e.g. Anderson (1979),
Harrigan (1993), Hummels (2001), Baier and Bergstrand (2001), Broda and Weinstein (2006) and
Bergstrand et al. (2013)). Our estimates for the trade cost elasticities are most comparable to those
from Broda and Weinstein (2006) for the elasticities of substitution, which is typically interpreted
as the trade cost elasticities. Broda and Weinstein (2006) acknowledge that they must make sim-
plifying assumptions on the demand and supply structure to empirically implement their method;
as a result, their model does not directly address factors such as firm heterogeneity and other un-
observed determinants of firms’ export market entry decisions. Our estimates are transparent in
controlling for those factors.
Furthermore, unlike this study most of these studies estimate the elasticities using either prod-1Arkolakis et al. (2012) note that the perfect competition models in Anderson and van Wincoop (2003) and Eaton
and Kortum (2002) and the monopolistic competition models in Melitz (2003) are all in a broad class of models sharingDixit–Stiglitz preferences, one factor of production, linear cost functions, complete specialization, and iceberg tradecosts. They show that if three “macro-level” restrictions hold, then all four models will share a common estimator ofthe gains from trade — which depends only on the import-penetration ratio and a gravity-equation-based estimate ofthe “trade-cost” elasticity of trade flows (of which sv is a measure of in many models). (Bergstrand et al. (2013), p.111)
2We assume a tariff to be a form of variable trade costs, and changes of tariff will have the identical effect on tradeflows as equal changes in any other sources of variable costs such as transportation costs due to fungibility of capital.Given that transportation cost and other costs are typically measured with serious errors while tariffs are measuredmore precisely in relative sense, focusing trade costs changes form tariff changes is beneficial, especially when theyare measured at very disaggregated product level.
2
uct or firm-level data for a single exporting country or country-level data for multiple country-pairs
(abbreviated as “pairs” hereafter) due to the limited availability of disaggregate data. On the one
hand, using data for a single country cannot account for unobserved heterogeneity at the importer
level and that could potentially cause severe bias. On the other hand, using aggregate data is even
more problematic because two potentially serious biases could arise in the estimations of trade
elasticities. Firstly, at any given time, tariffs vary substantially across products for each country,
but data aggregation omits the information on substitution between products, causing severe bias
in the trade elasticity estimation.3 Secondly, unfiltered bilateral trade data typically features a large
proportion of zero observations but aggregation can hide these zero flows which has special im-
plication. Information on zero trade flows at the product level is naturally lost with aggregate data
and firm’s self-selection of exporting at the product level could not be addressed as a result. Previ-
ous studies also point out problems associated with zeros observations. For instance, AvW (2004)
and Broda and Weinstein (2006) illustrate that ignoring zero trade flows in estimating trade cost
elasticities in the gravity equation using aggregate data can cause upward bias.
Helpman et al. (2008) (denoted as HMR hereafter) show that zero trade flows in aggregate data
are due to firms’ self-selection out of foreign/export markets, and suggest a two-stage procedure to
account for the bias caused by their self-selection. However, their method is difficult to implement
with product-level data, because the exclusion restriction at their first stage estimation require a
variable that could determine firms’ export market entry decision but could not affect the volume
of exports.4
In this paper, we propose a method to estimate trade cost (tariff) elasticities using product
level data that can avoid biases from various sources; pile-ups at zero trade flows, heterogeneous
pair-product fixed costs of exporting, and product-level multilateral resistance terms (MRTs). For
highly disaggregated product-level bilateral trade data, it is crucial to have an estimator that could3Anderson and van Wincoop (2004) (denoted as AvW (2004) hereafter) demonstrate with a numerical example
that “the elasticity of substitution at the more aggregated level is entirely irrelevant.” (p. 727). They advise that: “oneshould choose elasticities at a sufficiently disaggregated level at which firms truly compete.”
4Recent theoretical studies like Chaney (2008) and Krautheim (2012) pay attention to the role of fixed costs inheterogeneous firms’ entry decision in new export markets, which is empirically supported by Koenig et al. (2010).
3
deal with the source of zero observations in unbalanced data. This is because product-level bi-
lateral trade data could have more than half of the observations being zero, as countries typically
trade only specific products with individual partners. We extend the idea from the HMR approach
in controlling for firms’ fixed costs of exporting to a product level model using various within
transformations for unbalanced data in multi-way error components models. Our proposed esti-
mator is attractive because first, it allows us to account for various unobserved factors without the
requirement for any exclusion restrictions and second, unlike the conventional within estimator
it is consistent even when being directly applied to multiple error components with high dimen-
sional indexes (e.g. importer-product-year, exporter-product-year, importer-exporter-product) for
unbalanced data (Wansbeek and Kapteyn (1989); Baltagi et al. (2001); Davis (2002)). We alge-
braically show that our proposed within estimator has following main properties with unbalanced
data. Firstly, it can remove only one error component completely without causing any inconsis-
tency. Second, the order of demeaning matters for the magnitude of inconsistency in the sense that
only the first ordered demeaning could remove one error component completely, while the second
and further ordered demeanings could remove error components partially and cause inconsistency.
Third, using partially balanced data we could obtain certain within estimators that could remove
more than one error component completely (partially balanced data is defined as the data are bal-
anced in only two or three dimensions out of four) data. Therefore, the within estimators with
partially balanced data could avoid these biases from unbalanced data as well as computational
difficulty in dealing with high-dimensional unobserved factors. Our estimator directly accounts
for unobserved product-level heterogeneity using multiple high-dimensional dimensional fixed ef-
fects.
This paper contributes to the literature in the following ways. Firstly, we construct a new bi-
lateral tariff dataset for 47 countries and about 5,310 products at the HS6-digit level, from 2002 to
2010 as they are not readily available. Few studies have used product level tariff data for multiple
countries to estimate trade cost elasticity.5 For the case of trade flows, about 70% of the observa-5Exceptions are Hummels (2001) which uses year 1992 data of sectoral imports of six countries and Caliendo and
Parro (2014) which estimate trade elasticities for 20 sectors using year 1993 data for 30 countries.
4
tions, however, are of a zero value, meaning that many pairs trade only a limited range of products.
Disaggregated product level trade data (e.g, SITC 4 or 5 digits, HS6-digit) for multiple countries’
trading partners are publicly available from various data sources such as WITS and UNComtrade
but typically only positive flows are used in the estimation.6
Secondly, factors important for firms’ foreign market entry decision, including pair-product
level fixed costs and pair-year specific fixed costs, are explicitly addressed following HMR and
Chaney (2008). In our dataset, tariffs and trade flows vary across four dimensions, namely: im-
porting country, exporting country, product, and year. This allows us to directly control for (i)
unobserved demand factors such as consumer tastes using importer-product-year fixed effects, (ii)
unobserved supply factors such as productivity changes using exporter-product-year fixed effects,
(iii) unobserved pair-product-specific fixed costs of exporting (such as information costs) using
importer-exporter-product fixed effects, and (iv) unobserved pair-time-specific fixed costs (such
as political alliance) using importer-exporter-year fixed effects. To our knowledge, our empirical
strategy is the most comprehensive in terms of accounting for these four distinct unobserved fac-
tors that are important to the estimation of trade elasticities. Our proposed novel use of the within
estimator with partially balanced data could also avoid inconsistency and computational difficul-
ties associated with using the conventional within estimator and using a large number of dummy
variables for high-dimensional unobserved factors respectively. As a result, we can perform re-
gression analyses with product-level data while avoiding omitted variable bias (OVB) caused by
product-level MRTs and unobserved fixed costs which arise from information barriers, interest-
group lobbying, and government red-tape, to name just a few.7
Our empirical results show that ignoring zero observations and any one of the demand factors,
supply factors, product-level MRTs or fixed costs at the disaggregated level could overestimate
the trade elasticities (i.e. the absolute values of the estimates are smaller than their ’true values’).6See Broda and Weinstein (2006), Baier et al. (2014), and Dutt et al. (2013) for examples.7Anderson and Yotov (2011 and 2012) show importance of product level MRTs to accounting for the demand and
supply factors. The HMR and Chaney (2008) show the importance of controlling for fixed costs that is pair-product andpair-time varying unobserved factors which are related to fixed costs due to bureaucratic corruption and inefficiencyof trade that is not specific to a sector.
5
It implies that welfare gains from trade are much larger when measured with the trade cost elas-
ticity estimated using product-level data including zero observations with proper controls. As an
illustration, we provide some examples of welfare effects using trade costs elasticity and import-
penetration ratio measures following Arkolakis et al. (2012) and compare our results and those
from previous studies. Furthermore, our results imply that substantial heterogeneity exist in the
elasticity estimates across products. We examine elasticity heterogeneity using different market
structures of products and input quality in skill levels.
The rest of the paper is organized as follows. Section 2 describes the HMR model to introduce
an estimable gravity equation at the product level data using multi-way error components model.
Section 3 describes the data. In section 4, we explain our empirical model framework and provide
the estimation results. The last section provides concluding remarks and policy implications. Sim-
ulation and sensitivity analysis for our proposed estimator results are provided in the Appendix.
2 Gravity Model
2.1 Gravity Model at Sectoral Level
In this section, the gravity model is derived for sectoral (or product) level estimations. We adopt
the HMR model but extend to multiple sectors, k (k = 1,2, ...K). For aggregate data, K = 1; for
HS 2-digit data, K = 98; and for HS 6-digit data, K ⇡ 5,400. As in Chaney (2008) and Arkolakis
et al. (2012), we further assume that consumer’s utility function is the Cobb-Douglas for products
and Dixit-Stiglitz for varieties within a product.
A representative consumer in country j ( j = 1,2, ...J) has the following utility function at time
t:
Ujt =K
’k=1
ˆh2Hjkt
x jkt(h)(sk�1)/skdh
!qksk/(sk�1)
(1)
where x jkt(h) is a differentiated variety h of product k available in j at time t, Hi jt is the set of
6
varieties available for product k in j’s country at time t, skis the elasticity of substitution between
varieties of product k , which is assumed to vary across products with sk>1 and qk is the share of
income Yjt on goods from sector k with ÂKk qk = 1.
Although consumption can change over time as Yjt changes, for simplicity we assume no bor-
rowing or lending so that the consumer is maximizing consumption period by period instead of
in an intertemporal manner. There is also no dynamic in the production side unlike Alessandria
et al. (2014). Then, with country j’s income at time t, Yjtwhich equals the expenditure at time t we
obtain country j’s demand for variety h of product k at time t from (1):
x jkt(h) =p⇤jkt(h)
�skE jkt
P1�skjkt
(2)
where p⇤jkt(h) is the price of variety h of product k in country j at time t, E jkt = qkYjt is the
expenditure by j on goods from sector k from all origins at time t, and Pjkt is the price index
associated with varieties form sector k in j at time t where
Pjkt =
"ˆh2Hjkt
p⇤jkt(h)1�skdh
#1/(1�sk)
(3)
Using firms’ standard markup pricing equation in the monopolistic competition model and
following HMR (2008), we essentially have the same outcome for i0s imports of product k from j
as HMR (2008) for each product k in each time t.
Ti jkt =
✓c jktti jkt
akPit
◆1�sk
EiktNjktVi jkt (4)
where Ti jkt is the import of product k by country i from country j at time t, Njkt is the number of all
firms in sector k in j at time t, Vi jkt is a monotone function of the fraction of j’s firms that export
to i in sector k (which is based on firms’ relative productivity and fixed export costs to i) at time t.
c jkt is a measure of average productivity of firms in sector k in j at time , but different firms have
7
different productivity based on akt , ti jkt is the variable trade cost in sector k exported from j to i,
and (1�sk) is the trade elasticity with respect to variable trade cost for product k.
As in HMR, we also have
(Pikt)1�sk =
J
Âj=1
✓c jktti jkt
ak
◆1�sk
NjktVi jkt (5)
Vi jkt =
ˆ ai jkt
aLkt
(akt)1�sk dG(akt)
=gag�sk+1
Lt
(g �sk +1)(agHkt
�agLkt)Wi jkt (6)
Wi jkt = max
(✓ai jkt
aLkt
◆g�sk+1�1,0
)(7)
In the baseline specification, HMR assume that firm productivity 1/akt is truncated Pareto
distributed to the support [aLkt,aHkt ] so that G(akt) = (agkt � ag
Lkt)/(agHkt � ag
Lkt),g > (sk � 1). Pikt
is corresponding to i’s MRTs in product k at time t as introduced by Anderson and van Wincoop
(2003), but it additionally takes into account the impact of firm heterogeneity with the fixed costs
of exporting on prices in product k.
The model can be solved for the following estimable equation:
lnTi jkt = (sk �1)lnak � (sk �1)lnc jkt + lnNjkt +(sk �1)lnPikt
+lnEikt � (sk �1)lnti jkt + lnVi jkt (8)
Pikt , and Njkt and c jkt are unobserved, but they can be accounted for by using exporter-sector-
specific factors and importer-sector-specific factors. Thus, with Wi jkt, being a monotonic function
8
of Vi jkt , we can obtain the following estimable equation:
lnTi jtk =� (sk �1)lnti jkt + lnWi jkt +u1ikt + v1 jkt (9)
where u1ikt =(sk�1)lnPikt +lnEikt +(sk�1)lna , and v1 jkt = lnNjkt �(sk�1)lnc jkt +ln(gag�sk+1Lt )�
ln(g �sk +1)� ln(agHkt
�agLkt).
Here we depart from HMR and further assume that lnWi jkt could be decomposed into a number
of additively separable components:
lnWi jkt = w2i jk +h2i jt +u2ikt + v2 jkt + e2i jkt (10)
where w2i jk represents time-invariant pair-sector specific fixed costs, h2i jt is pair-time specific
random shock to fixed costs, u2ikt is importer-sector-time specific random shock to fixed costs,
v2 jkt is exporter-sector-time specific random shock to fixed costs, and e2i jkt is all remaining random
elements that could affect fixed costs of firm’s export market entry decision. These five factors
determine the fraction of firms that export from j to i in sector k at time t.
Likewise, we further assume that non-tariff-related variable trade costs can be decomposed into
five additively separable factors such that:
lnti jkt = ln(1+ tari f fi jkt)+w3i jk +h3i jt +u3ikt + v3 jkt + e3i jkt (11)
where w3i jk +h3i jt +u3ikt + v3 jkt + e3i jkt is the linear approximation of non-tariff-related variable
trade costs.
Substituting (10) and (11) into (4), we obtain the following equation:
lnTi jtk = a0 � (sk �1) · ln(1+ tari f fi jkt)+wi jk +hi jt +uikt + v jkt + ei jkt (12)
where wi jk ⌘ w2i jk � (sk � 1)w3i jk, hi jt ⌘ h2i jt � (s � 1)h3i jt , uikt ⌘ u1ikt + u2ikt � (sk � 1)u3ikt ,
v jkt ⌘ v1 jkt + v2 jkt � (sk � 1)v3 jkt , and ei jkt ⌘ e2i jkt � (sk � 1)e3i jkt +(s �sk)h3i jt with s is the
9
average of sk. As wi jk, hi jt , uikt , and v jkt are unobservable, we model them using fixed effects.
Here (sk �1) can be consistently estimated if we can assume
Cov(ln(1+ tari f fi jkt),ei jkt |wi jk,hi jt ,uikt ,v jkt) = 0 (13)
That is, we assume that once we have controlled for all i jt-, i jk-, ikt-, and jkt-varying factors,
the remaining trade costs factors that could vary in all four dimensions of i, j,k and t simultaneously
(i.e. ei jkt) are uncorrelated with tari f fi jkt . One possible scenario in which this assumption is
violated is that, when a country lowers its tariffs on certain imports from a specific source country
as a results of, for instance, trade agreement negotiations, it might provide assistance to the affected
industries or erect non-tariff barriers on those products as a compensation (Ray, 1981; Limao and
Tovar, 2011). However, as long as the assistance or non-tariff barriers are industry specific but not
industry and source country specific, they would be controlled for by uikt .
Also, the consistency condition (13) is not conditional on Ti jkt > 0 unlike the HMR method.
In order to directly incorporate zero trade flows in the estimation, we replace lnTi jkt in (9) with
ln(Ti jkt +1).
2.2 Self-Selection and Firm Heterogeneity
A key insight of HMR is that zero trade flows commonly observed in bilateral trade data are re-
sults of firms’ decision of self-selection into (or out of) foreign market and the decision is not
uniform across country due to firm heterogeneity.8 HMR suggest a two-stage procedure to account
for self-selection and firm heterogeneity for aggregate data. The first-stage estimation requires an
exclusion restriction variable that affects firms’ entrance into a foreign market but not their trade
volume.9 In order to extend the HMR approach to estimate product level models, the first stage8As we use product level data, we assume that firms’ self-selection of exporting occurs at pair-product level for the
rest of paper.9HMR first consider bilateral regulation measure as the exclusion restriction variable. Besides strong assumption
of excludability in the model for trade volume, a practical limitation of this exclusion restriction is data availability,which prompts HMR to consider an alternative variable of an index of common religion (between any pair). Data oncommon religion has been available only for sporadic periods until Maoz and Henderson (2013) construct a world
10
estimation requires an exclusion restriction variable that affects firms’ entrance into individual
product markets in a foreign country but not their trade volumes in each of those markets. How-
ever, finding an exclusion restriction that varies over pair-product, importer-product-time, and/or
exporter-product-time is extremely difficult due to data limitation.
To avoid data problems, we propose an estimator that allows us to by-pass the first-stage esti-
mation in the HMR’s two-stage procedure while still controlling for self-selection and firm hetero-
geneity. We assume that unobserved factors associated with self-selection and firm heterogeneity
could be approximated by four additively separable factors, wi jk +hi jt + uikt + v jkt , then we can
write an estimable equation using a multi-way error components model. Furthermore, it could
be estimated by the within estimator that treat unobserved factors wi jk +hi jt + uikt + v jkt as fixed
effects.
Therefore, when using disaggregate panel data, our estimator is superior to the two-stage HMR
method in several ways. First, it avoids data limitations as mentioned above. Second, fixed
effects are free from measurement errors and validity assumption associated with exclusion re-
striction (e.g. the religion variable). Next, besides pair-time unobserved factors associated with
self-selection and firm heterogeneity, it allows us to additionally control for pair-product, importer-
product-time, and exporter-product-time unobserved factors associated with self-selection and firm
heterogeneity using fixed effects..
Furthermore, those multi-dimensional fixed effects account for other unobserved factors as
well. For instance, importer (exporter)-product-time fixed effects additionally control for product-
time varying MRTs and exporter’s supply (importer’s demand), which are crucial to obtain unbi-
ased estimates, but again very difficult to collect at fairly dissagregate product level.
religion dataset for every five years from 1945 to 2010. While not to depreciate the value of this improved dataset onreligion, we should be cautious about the measurement errors due to the challenging nature in measuring them at thefirst place.
11
3 Data
We construct two sets of data. The first data set is HS 6-digit import data from UN COMTRADE
for 47 countries from 2002 to 2010 for about 5,310 products. The second data set is bilateral tariff
data from TRAINS of UNCTAD for the same sample. More detailed explanations of the data are
provided in the Appendix.
For each pair of country and any product with available tariffs, unreported entries of trade flows
are replaced with zero. In our data only positive observations are reported. Because every country
in our sample imports from other sample countries for all sample periods in aggregate level, we
presume that unreported entries at product level are zeros.
Conceptually, balanced data mean that for all pairs of countries (i j), products (k) and time (t),
observations are available for both the dependent variable (y) and all explanatory variables (X).
In practice, bilateral trade data can never be truly balanced. This is because, data is ’naturally’
unbalanced because countries do not conduct international trade with themselves so that Tii is not
defined for all i. Furthermore, there are some cases where we do not observe a country’s tariff rate
for a specific product for entire sample years. For example, in the cases of Argentina, no tariffs are
reported for about 20 out of 5,310 products. We leave them as missing to be truthful to the data.10
Finally, some product codes are removed, combined with others, or newly introduced and these
products could induce missing data over time. We define partially balanced data to deal with this
type of missing data.
Besides balanced data, we also consider unbalanced and partially balanced data (i.e. only part
of dimension is balanced among four dimensions) for the comparison purpose in evaluating our
estimator and conventional within estimator. The use of the unbalanced data in most of previous
studies is inevitable as they use only positive trade flows, i.e. zero trade flows are not imputed in10These two types of missing data together make up only less than 3% of all data. Because the proportion of
missing is very small, we regard the data as (almost) balanced in practical sense if there are not other sources ofmissing data. Extensive simulation experiments are performed with various data generating processes to examine theeffect of missing data for our proposed within estimators. Simulation results show that there is no size bias of theestimators. Results can be found in Kwak et al. (2012) for three-way error components model and in the appendix forfour-way error components model .
12
estimations. The degree of unbalancedness depends on the level of data aggregation, increasing up
to about 70% for HS6-digit data in our sample. We show that the size bias of conventional within
estimator increases with missing proportion in unbalanced data so that more care is required in
using within estimator with disaggregate data to avoid bias.
4 Empirical Framework
4.1 A Gravity Equation with 4-way Error Components
Our estimation equation is obtained from (12) in Section 2. :
ln(Ti jkt +1) = a0 +a · ln(1+ tari f fi jkt)+wi jk +hi jt +uikt + v jkt + ei jkt (14)
where ln(Ti jkt + 1) is log trade flows, tari f fi jkt is a tariff that importing country i imposes on
product k from exporting country j at time t. wi jk, hi jt , uikt and v jkt are pair-product, pair-
time, importer-product-time, and exporter-product-time heterogeneity, respectively and ei jkt is
an idiosyncratic error term. Our primary object of interests is to obtain a consistent estimate of
a = (1�s), the elasticity of trade costs. This specification implies that, in contrast to the the-
oretical model in section 2, here we assume the trade elasticity, 1�sk, is homogeneous across
all products so that 1�sk = 1�s . We relax this assumption later to adherent to the theoretical
model.
To control for unobserved factors, we adopt fixed effects following standard approach in the
studies.. However, unlike in 3-way error components (3WEC) model followed by most of studies
in the literature (e.g., Baier and Bergstrand (2007); Magee (2008); Baier et al. (2014) ) using fixed
effects in the 4-way error components (4WEC) model may not be trivial depending on degree
of balancedness of data. Using the dummy variable method is computationally challenging and
infeasible in most statistical package while within estimator could be inconsistent unless data are
13
balanced.11 In this sub-section, we discuss within estimators for different degrees of balancedness
of data and suggest a method to adopt in our estimations.
4.1.1 Within Estimator for Balanced Data
As provided in the Appendix, with balanced data and conditional exogeneity assumption on ei jkt ,
the conventional LSDV (Least Squared Dummy Variables) estimator to eq.(??) is numerically
equivalent to the pooled Least Squared (LS) estimator (See Wooldridge (2009); Abrevaya (2013)
for two dimensional panel data cases) in the following equation with the within transformed vari-
ables:
yi jkt = Xi jkta + ei jkt (15)
where multiple transformations that remove all four unobserved factors is defined as follows:
yi jkt = yi jkt � yi jk·� yi j·t � yi·kt � y· jkt
+yi j··+ yi·k·+ yi··t + y· jk·+ y· j·t + y··kt (16)
�yi···� y· j··� y··k·� y···t + y····
In eq.(16) a dot in the subscript denotes the averaging dimension. For instance, yi jk·=1
Ti jkÂT
t=1 si jktyi jkt ,
where si jt is an indicator for observability of both yi jkt and Xi jkt (i.e. si jkt = 1 if both yi jkt
and Xi jkt are observed, and zero otherwise), and Ti jk = ÂTt=1 si jkt , so that Ti jk = T for balanced
11Multi-way error components model are typically estimated using dummy variables (e.g. Antweiler (2001); Baltagiet al. (2001); Davis (2002) among others) or within transformations of variables to remove all or some of these errorcomponents (e.g. Matyas and Balazsi (2012) and Kwak et al. (2012)). However, an extension of these methods toestimate product level data are very difficult. Firstly, the dummy variables method confronts computation difficultiesarising from the large size of the data set. Antweiler (2001); Baltagi et al. (2001); Davis (2002) propose estimatorsthat can be applied to unnested and unbalanced bilateral trade data; in practice, however, implementation of theirmethod requires to specify the transformation matrix and to generate too many dummy variables. As a result, we arenot able to find any empirical study of gravity models implementing this type of estimators. On the other hand, thewithin-transformation based estimators generally are biased with unbalanced panel data although they do not sufferfrom computational difficulty.
14
data but Ti jk < T for unbalanced data.12 We denote averaging over other dimension as fol-
lows: yi·kt =1
ni jtÂn
j=1 si jtyi jt , nikt = Ânj=1 si jkt , y· jkt =
1m jkt
Âmi=1 si jktyi jkt and m jkt = Âm
i=1 si jkt , yi·k·
= 1Tik
ÂTt=1
1nikt
Ânj=1 si jktyi jkt . We can similarly define the rest of averaging terms in eq.(16).
The four error components, wi jk, hi jt , uikt , and v jkt , can be removed by demeaning over the
dimensions of t, k, j, and i, respectively. Our estimation strategy is to remove all four error compo-
nents by using four distinct within transformations. Four successive demeanings need to be applied
and these give yi jkt as the dependent variable in eq.(16). Also note that the order of demeaning in
t, k, j, and i does not matter for balanced data but it matters for the bias size in unbalanced data.
The transformations of X and e are denoted as Xi jt and ei jt , respectively, could be similarly de-
fined as in (16). We call eq.(16) the four-way within transformation as it removes all 4WECs. The
four-way within estimator is consistent only when the data is balanced and conditional exogeneity
assumption on ei jkt is satisfied; for unbalanced data it is inconsistent.13
Our sample includes 47 largest countries countries in terms of GDP in 2007 but excludes some
countries such as India that do not have tariffs data at least for one partner country. Therefore,
for each country at a given year, both tariff data and trade flows data imputed with zeros are
available for HS 6-digit products for all partner countries. Therefore, tariff data are balanced for
pair and product but not year. This is because each year variety of products vary over some years.
For instance, in 2007 reclassification occurs for some HS 6-digit products. In the process, some
product codes are removed, combined with others, or newly introduced. Therefore, data are not
balanced in strict sense. We name this partially balanced data and provide further discussion on
this in section 4.2.3.12Using selection indicator notation is very powerful to show our algebraic results as it manifests the sources of bias
for the conventional within estimators in unbalanced data.13We assume conditional exogeneity of idiosyncratic error terms throughout the paper. For the within estimator
in (15), non-zero terms remain after the within transformation. Moreover, these terms are not mean-zero and do notgo to zero as the sample size increases. These terms are all zero if si jkt = 1 for all observations as in the case ofbalanced data, but once si jkt becomes random non-zero terms, additional terms remain after transformation and causeinconsistency. See Matyas and Balazsi (2012); Kwak et al. (2012) for the detailed discussion of within transformationfor 3WEC models and Monte Carlo evidence.
15
4.1.2 Within Estimator for Unbalanced Data
In general, the within estimator that removes multiple error components in unbalanced data is
inconsistent. Bias arises when eq.(16) is applied to unbalanced data and its size depends upon
which of error components are the source of bias and the order of demeaning. Unlike the case
of balanced data, the values of the last eleven terms in eq.(16) (i.e. all terms that are averaging
two or more dimensions yi j··, yi·k·, yi··t , y· jk·, ...y...) now depend on the order of averaging. For these
eleven terms, we use superscripts to denote the order of averaging over various dimensions. For
instance, y jti·k· is obtained by averaging over the j-dimension first and the t-dimension second, i.e.
y jti·k· =
1Tik
ÂTt=1
1nikt
Ânj=1 si jktyi jkt , while yt j
i·k· is obtained the reversed order of averaging over the
two dimensions, i.e. yt ji·k· =
1nik
Ânj=1
1Ti jk
ÂTt=1 si jktyi jkt . The mean values depend on the order of
averaging, y jti·k· 6= yt j
i·k· unless by coincidence. This is because si jkt depends on all i, j, k, and t.
As a result, 1Tik
ÂTt=1
1nikt
Ânj=1 si jkt and 1
niktÂn
j=11
TikÂT
t=1 si jkt become distinctively different values.
Similarly, the mean values of the other eleven terms in eq.(16) depend upon the order of averaging.
Finally, for the y··· (i.e. yi jkt , yi jtk, yit jk, yitk j, yikt j, y jk jt , ......), there are 24 distinct sequences of four
dimensional averaging and their values are different each. For a 4WEC model, there are a total of
24 different orders of averaging and thus 24 different transformations based on eq.(16).
For any unbalanced data, we use the property of the within estimator that one of the error
components in eq.(??) can be completely removed by the within transformation without causing
any bias. For our purpose, the important implication is that t-dimensional demeaning, if performed
very first, can remove the unobserved factor wi jk:
si jktwi jk �1
Ti jk
T
Ât=1
si jktwi jk = si jktwi jk �wi jk1
Ti jk
T
Ât=1
si jkt = si jktwi jk �wi jk1
Ti jkTi jk = si jktwi jk �wi jk
where first equality hold because wi jk does not depend on t. Thus, for observed value, si jkt = 1,
si jktwi jk �1
Ti jk
T
Ât=1
si jktwi jk = si jktwi jk �wi jk = 0 (17)
16
Similarly, one error component can be removed by the within transformation such that k-
dimensional demeaning, i-dimensional demeaning, and j-dimensional demeaning, if performed
very first, can remove hi jt , uikt , and v jkt , respectively. Obviously, since only one of these four
demeanings can be performed first, three unobserved factors cannot be completely removed and,
thus, potentially become the sources of bias in the estimations.
Furthermore, less bias is retained if it is an earlier demeaning dimension that takes out the error
component that causes inconsistency.14 For instance, suppose that hi jt is the only unobserved
factor correlated with tari f fi jkt in equation (??). Let us consider demeaning over product and
four classes of demeaning orders: k · ··, ·k · ·, · · k·, · · ·k. Here k · ·· indicates demeaning over the
k dimension first (e.g. ki jt, k jti, kt ji etc.), ·k · · demeaning over the k dimension second, and so
forth. Within estimators of each of these demeaning order have different magnitudes of bias due to
omitted variable hi jt . We have already proven that the class k · ·· can completely remove OVB due
to hi jt . Simulation results in the Appendix show that absolute magnitude of OVB due to hi jt for
other classes increases as the demeaning order for k increases, i.e. k · ·· will cause no bias, ·k · · will
cause the second smallest bias and · · ·k the largest bias. A corollary of these findings is that, within
transformations can remove one error component completely and for the other error components
only partially. However, compared to the estimates from pooled OLS, the absolute magnitude of
the bias from each source is reduced from the within transformations.15
4.1.3 Within Estimator for Partially Balanced Data
Our sample is balanced in pair and product dimension but unbalanced in time dimension. In this
case, we can still design within transformations that could take out multiple error components
completely. By investigating the sources of biases and how biases arise from each demeaning of
within transformation, we could obtain a consistent within estimator using partial balanced of data,
si jkt = st . Our main insight to obtain a consistent estimator is that we remove wi jk as the first order
14The Appendix C provides some algebraic results.15Matyas and Balazsi (2012) suggest a solution is to remove one error component using within transformation, and
then model the remaining error components using dummy variables. However, in the case of using pair-product-levelpanel data, there are still way too many dummy variables to be feasibly estimate even using their method.
17
t-dimensional demeaning (note that t is not balanced dimension) and remove uikt , and v jkt as j and
i dimensional demeanings (note that data are balanced at pair level).
Let us consider transformations that remove wi jk, uikt , andv jkt completely. The demean order-
ing we use as an illustration is i- j-t. First, first order demeaning over dimension i can remove v jkt
as shown in section 4.2.2. Now, we consider uikt with double demeanings over dimensions i and j.
stuikt �1N
N
Âi=1
stuikt �1N
N
Âj=1
(stuikt �1N
N
Âi=1
stuikt) = stuikt � st1N
N
Âi=1
uikt � stuikt1N
N
Âj=1
+st1N
N
Âj=1
1 · 1N
N
Âi=1
uikt
= stuikt � st1N
N
Âi=1
uikt � stuikt + st1N
N
Âi=1
uikt ·1N
N
Âj=1
1
= stuikt � st1N
N
Âi=1
uikt � stuikt + st1N
N
Âi=1
uikt = 0
Finally, we consider wi jk with triple demeanings over dimensions i, j, and t.
stwi jk�1N
N
Âi=1
stwi jk�1N
N
Âj=1
(stwi jk�1N
N
Âi=1
stwi jk)�1T
T
Ât=1
[stwi jk�1N
N
Âi=1
stwi jk�1N
N
Âj=1
(stwi jk�1N
N
Âi=1
stwi jk)]
= stwi jk � st1N
N
Âi=1
wi jk � st1N
N
Âj=1
wi jk + st1N
N
Âj=1
1N
N
Âi=1
wi jk
�wi jk1T
T
Ât=1
st + st1N
N
Âi=1
wi jk +1T
T
Ât=1
st1N
N
Âj=1
wi jk �1T
T
Ât=1
st1N
N
Âj=1
1N
N
Âi=1
wi jk
= stwi jk � st1N
N
Âi=1
wi jk � st1N
N
Âj=1
wi jk + st1N
N
Âj=1
1N
N
Âi=1
wi jk
�wi jk + st1N
N
Âi=1
wi jk +1N
N
Âj=1
wi jk �1N
N
Âj=1
1N
N
Âi=1
wi jk
Therefore, for st = 1,we can rewrite the last equation as follows:
18
wi jk�1N
N
Âi=1
wi jk�1N
N
Âj=1
wi jk+1N
N
Âj=1
1N
N
Âi=1
wi jk�wi jk+1N
N
Âi=1
wi jk+1N
N
Âj=1
wi jk�1N
N
Âj=1
1N
N
Âi=1
wi jk = 0
and we obtain zero because all terms are canceled out each other.
In sum, using st = 1 and triple demeanings over i- j-t (or it could be shown for demeanings over
j-i-t), we algebraically show that within transformations can remove wi jk, uikt , andv jkt completely.
Thus, for all our estimations in results section, we obtain elasticity estimates from using within
transformations over i- j-t and dummy variables for hi jt unless indicated otherwise. Note that
number of dummy variables for hi jt is 20⇥19⇥9 = 3,420 and no computational difficulty arises
with less than 4,000 explanatory variables. We provide extensive simulation results for within
estimators with triple transformations and partially balanced data, st = si jkt in the Appendix.
4.2 Estimation
4.2.1 Trade Elasticities with Disaggregate data
Table 1 shows various estimation results of using HS6-digit disaggregate data. In column (1), we
only use non-zero observation while in columns (2)-(5) zero observations are included, and various
columns differ in term of the controls of unobserved factors.
At the HS6-digit level of disaggregation, about 67% of trade flow observations are zeros. The
inclusion of such large number of zero observations dramatically reduces the trade elasticity es-
timate from -3.6 down to -0.7. The reason for this is that almost all country pairs trade only a
limited number of HS6-digit level products at any time. For instance, countries that do not have
oil reserves cannot export crude oil, and developing countries do not have skilled labour to export
high-technology products. When zero trade flows are excluded from the estimation, the trade elas-
ticity estimate is determined entirely by the response of the intensive product margin of trade to the
reduction in tariffs. According to column (1), a 1% reduction in tariffs is expected to increase the
flows of existing trade link between two countries by 3.6%. When zero trade flows are included,
19
the estimate is determined by the response of both the intensive and the extensive product margins
of trade. Even if some firms are responsive to reduction in variable trade costs and start to export to
new market, the initial volume is likely to be small compared to the export volume to established
markets. As a result, a 1% reduction in tariff only increase the average trade flow per product
by a much smaller extent of 0.7%. However, it should be noted that column (2) has not properly
accounted for the product-specific unobserved heterogeneity including product-specific fixed costs
and MRTs, and therefore suffers from omitted variable bias.
As explained in Section 2, we assume these unobserved factors can be collectively decomposed
into hi jt , uikt , v jkt and wi jk. We start with controlling for i jt-varying unobserved factors in column
(3), which reduces the trade elasticity estimate marginally to -0.68. In column (4) we further
control for ikt- and jkt-varying unobserved factors. The elasticity estimate reduces from -0.68 to
-0.57. In column (5) we have the full set of controls for unobserved heterogeneity at the product
level; the additional control for i jk-varying factors further reduces the elasticity estimate to -0.43.
A comparison of columns (2) to (5) shows that the progressively more comprehensive control
of unobserved heterogeneity reduces the estimation bias. In particular, the absolute value of the
elasticity estimate is reduced by nearly 40% from -0.71 in column (2) to -0.43 in column (5). This
indicates that OVB is as sizable as aggregation bias. But bias from omitting zero observations is
by far the largest amongst the three sources of bias.
Arkolakis et al. (2012) show that for a range of trade models, including the Armington model
and new trade models with micro-foundation like Eaton and Kortum (2002) and Melitz and Otta-
viano (2008), the welfare gains from trade (compared to autarky) can be simply measured using
two statistics, the share of expenditure on domestic goods, l and the elasticity of imports with re-
spect to variable trade cost, f . Using their welfare change formula to evaluate the welfare change
in US’s year 2000 compared to Autarky, which is (1�l�1/f ) where l = 0.93, they illustrate that
the percentage change in real income needed to compensate a representative consumer for going
back to autarky is 0.7 percent to 1.4 percent depending if the trade elasticity are ranged from -5
to -10 as surveyed in Anderson and van Wincoop (2004). If we use the estimate obtained from
20
Table 1: Trade elasticities: HS 6-digit
(1) (2) (3) (4) (5)Disaggregate: HS6
Elasticity (1-s ) -3.606*** -0.705*** -0.681*** -0.571**** -0.427***(0.354) (0.123) (0.125) (0.268) (0.101)
ij, it, jt Yes Yes Yes Yes Yesijt No No Yes Yes Yes
ikt, jkt No No No Yes Yesijk No No No No Yes
Model log-linearNo. of obs. 6,016,280 18,158,824
Notes: Cluster (pair) robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the10%, 5%, 1% levels, respectively. We used balanced data and the within estimator.
non-zero observations as in column (1), US’s gains from trade in year 2000 is implied to be 2
percent, slightly higher shown in Arkolakis et al. (2012) still seems to be small. When we use the
estimate obtained from data including zero observations with proper product-level controls as in
column (5), the implied US’s gains from trade in year 2000 increase to 16 percent. This simple
numerical example serves the purpose that, to evaluate the welfare impact of trade liberalization,
it is paramount to have an unbiased estimate of trade elasticity.
4.2.2 The Extensive Margin
Given that the elasticity estimate is substantially smaller when the extensive product margin of
trade is added to the intensive product margin of trade, it will be interesting to know that how the
extensive product margin of trade itself responses to change in variable trade costs.
We use the following model to estimate the effect on the probability of starting import.
I(Ti jkt > 0) = F(a0 +a · tari f fi jkt +wi jk +hi jt +uikt + v jkt) (18)
21
Table 2: Probability of starting new product trade
(1) (2) (3)Disaggregate: HS6
Probability -0.011 -0.068*** -0.038***(0.009) (0.023) (0.011)
ijt Yes Yes Yesikt, jkt No Yes Yes
ijk No No YesModel linear probability model
No. of obs. 18,158,824
Notes: Cluster (pair) robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the10%, 5%, 1% levels, respectively. We used balanced data and the within estimator.
where I is an indicator and F(·) is normal CDF. We estimate (18) using linear probability model
that allows to directly control four multi-way error components. Table 2 shows the probability of a
country starting to export a new product to a partner country in response to a tariff cut by its partner
for that product. Given this is a probability model, the tariff enters the model in level terms instead
of in logarithmic terms. The estimated probability varies depending on what type of unobserved
heterogeneity being accounted for. In column (3) where the control for unobserved heterogeneity
is most comprehensive, it is found that on average there is a close to 3.8% chance that a country
will start importing a new product from a partner country if the latter cuts the tariff on that product
by 1 percentage point. At least in a one-year horizon, the response of the extensive trade margin to
changes in variable trade cost is rather small.
4.2.3 Products from Different Market Structures
In this section, we decompose the products into three groups using the classifications in Rauch
(1999) and examine whether goods traded on organized exchanges have larger trade elasticity.
Rauch (1999) classifies goods into three groups: commodities, reference priced goods and differ-
entiated goods. Commodities are those goods traded on organized exchanges. Reference priced
goods are the goods that reference prices of them being published in trade journals, and the rest are22
differentiated goods.16 In our sample, 5.1%, 30.7% and 64.2% of the observations are commodi-
ties, reference priced and differentiated goods, respectively.
Broda and Weinstein (2006) use a model of import demand and supply equations and find the
commodities have higher elasticity of substitution than the other goods. However, as they point
out, commodities do not necessarily imply perfectly substitutable goods. They said, “For example,
although tea is classified by Rauch as a commodity, it is surely quite differentiated. Similarly, it
is hard to see why a commodity like “dried, salted, or smoked fish” would be more homogeneous
than a reference priced good like “fresh fish” or a differentiated good like “frozen fish.”
Table 4 shows the results.17 In column (1), commodity goods have the largest trade elasticity,
followed by referenced priced goods, and then differentiated goods. This is consistent with the
expectation in Rauch (1999) and finding in Broda and Weinstein (2006). However, it can be noticed
that the trade elasticity of differentiated goods is of an unexpected positive sign. This could be
due to fact that we have not fully accounted for all product-level unobserved heterogeneity and
therefore the estimate is biased.
When we progressively control for more unobserved factors, surprisingly the relative trade
elasticities of the three types of goods reverse. In column (3), differentiate goods now have the
largest elasticity and of a correct sign, while commodities goods have the smallest and insignificant
one. The reverse of relative magnitude can be explained as the following. The world markets
for commodities are more integrated (via organized market trading for instance). As a result,
changes in tariffs on a product between a pair has greater effects on the prices of that product
elsewhere, and the feedback (or general equilibrium) effect on the trade between the pair will
also be stronger. In the Anderson and van Wincoop (2003) framework, the relative prices of
goods in the rest of the world for a pair is denoted as MRTs. In our modeling setting, product-
level MRTs are captured by ikt and jkt fixed effects. Therefore, once we have controlled for ikt-
and jkt-unobserved heterogeneity, we eliminate this feedback effect. Because the feedback effect
could be larger for goods sold in more organized market, its removal may reverse the relative
16See Rauch (1999) for details.17We use the conservative definition from Rauch (1999). Using his liberal definition yields very similar results.
23
Table 3: Trade elasticities heterogeneity according to homogeneity of goods with unobserved het-erogeneity
(1) (2) (3)Disaggregate: HS6
Differentiated goods 0.483*** -0.512 -0.482***(0.169) (0.366) (0.143)
Reference priced goods -1.444*** -0.519* -0.300***(0.151) (0.281) (0.080)
Commodity goods -1.887*** 0.097 -0.041(0.217) (0.416) (0.183)
ijt Yes Yes Yesikt, jkt No Yes Yes
ijk No No YesModel log-linear
No. of obs. 15,835,089
Notes: Some sectors that were not categorized in these three groups are dropped in the estimations. Cluster (pair)robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the 10%, 5%, 1% levels,respectively. We used balanced data and the within estimator.
trade elasticities of the three types of products in column (3) as compared to column (1).
4.2.4 Products of Different Skill-Intensity
Firms’ markups depend on trade elasticity. Higher trade elasticity yields lower markups. Skill
intensity of products may indicate the firms’ market power, where greater market power sets higher
markups. Firms produce higher skill intensity products, for example, contract lenses, may have
more market power compared firms producing lower skill intensity products, for example toilet
papers.
In this section, we divide the products based on skill-intensity based on the classification of
Basu (2011).18 In our sample, 25.5%, 34.8%, 16.7% and 23% are minerals, resource and low-skill
intensive manufacturing (e.g., toilet paper), medium-skill intensive manufacturing (e.g., air con-
ditioner) and high-skill intensive manufacturing (e.g., microscopes), respectively. Table 5 shows
the results. In column (1), without proper controls at product level, the trade elasticity estimates18See Appendix for details of the classification.
24
Table 4: Trade elasticities heterogeneity according to skill-intensity of goods with unobservedheterogeneity
(1) (2) (3)Disaggregate: HS6
Minerals -1.379*** -0.214 -0.187**(0.158) (0.216) (0.078)
Resource & low-skill intensive manufacturing -0.475* 0.410 -0.474***(0.252) (0.425) (0.159)
Medium-skill intensive manufacturing 6.722*** -5.129*** -0.368**(0.458) (0.762) (0.186)
High-skill intensive manufacturing 1.692*** -0.571 -0.216(0.248) (0.654) (0.141)
ijt Yes Yes Yesikt, jkt No Yes Yes
ijk No No YesModel log-linear
No. of obs. 14,712,838
Notes: Some sectors that were not categorized in these four groups are dropped in the estimations. Cluster (pair)robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the 10%, 5%, 1% levels,respectively. We used balanced data and the within estimator.
for medium and high-skill intensive manufacturing have wrong signs and statistically significant.
However, with controlling for all fixed effects in column (3), the estimates change dramatically
and turn to be of correct signs. Resource and low-skill intensive manufacturing products have the
highest elasticity and of a correct sign while high-skill intensive manufacturing products have the
lowest and insignificant one. It implies that if countries’ final goods imports are more weighted on
low and medium-skill intensive manufacturing goods, for the same trade share the welfare gains
from trade would be smaller.
5 Conclusion
Trade elasticity is a key variable in assessing the gains from trade. In particular, gains from trade are
shown to be a power function of import penetration ratio and trade elasticity that, a small decrease
in the absolute value of trade elasticity implies a large increase in gains from trade. Previous
25
empirical literature almost uniformly suggests that the trade elasticity is of an order way bigger
than unity, implying questionably small gains from trade In this paper, we show that previous
estimations suffer from various sources of bias: data aggregation, pile-ups at zero trade flows,
heterogeneous pair-product fixed costs of exporting, and product-level MRTs.
To account for these sources of bias, this paper estimated trade elasticity using an new dataset
on tariffs, a novel empirical model, and an innovative estimation method. The new dataset devel-
oped in this paper contains tariff and bilateral trade data for 47 countries from 2002 to 2010 at the
HS 6-digit product level, which covers over 5,300 products. This is the most disaggregate tariff
and bilateral trade data that are currently available for such a large number of countries and years.
This highly disaggregate dataset provides us a platform to tackle bias due to data aggregation on
the one hand, but exacerbates the problem of pile-ups at zero trade flows on the other because
each country pair tends to trade a limited range of HS6-digit level products. A proper treatment
of zero trade flows at the product level requires us to account for heterogeneous pair-product fixed
costs of exporting. We propose to deal with this problem as well as product-level MRTs using a
gravity model with 4-way error components. These error components control for observed and
unobserved heterogeneity at i jt, ikt, jkt and i jk dimensions using fixed effects. Estimating such as
4WEC models is not trivial, especially with unbalanced data, as being the case here. We therefore
suggest a novel application of the within estimator so that it is consistent even with unbalanced
data.
It is shown that, once these sources of bias are arrested, the trade elasticity estimate is reduced
by a fraction of one-tenth, implying much sizable welfare gains from trade. It is also found that
the response of the extensive product margin of trade to tariff cut is quite small at least in the short
run, leaving the intensive product margin of trade the main channel through which tariff changes
affect trade flows. Lastly, there is evidence that there are large variations in the trade elasticities
across sectors of different levels of market structures and skill-intensity.
26
A Data descriptions
For our analysis we need bilateral tariff data at product level. Bilateral tariff data at sectoral levels,
however, are not readily available and need to be newly constructed. We construct a dataset for
time-variant bilateral tariffs at the HS6-digit level for every pair in our sample as follows.19
First, we download each importing country’s HS6-digit simple average applied MFN tariffs
from the World Integrated Trade Solution (WITS). For countries engaged in a Customs Union
(CU), tariff schedules are available at the union level only. We therefore, use these CU tariff
data to generate country level MFN tariff data for the member countries of a CU. Also, for some
countries, MFN tariffs are not available for certain years. In those cases, we use the data from the
closest previous available year as a substitute. The MFN tariffs are unilateral instead of bilateral.
To obtain bilateral tariff data, for each year we apply an importing country’s MFN tariffs to all its
trading partner countries. This procedure is justifiable as our sample includes the WTO members
only.
Preferential tariffs are collected separately for applicable pairs for each year also at the HS6-
digit level. If preferential tariffs on certain products are imposed by an importing country are
applicable to multiple exporting countries simultaneously, data are observed at the group level
only. To identify which countries belong to a specific group, we use the Preference Beneficiaries
data from the Trade Analysis and Information System (TRAINS), and then generate preferential
tariff data for each pair. Like MFN tariffs, for the countries engaged in a CU, we only observe
the preferential external tariff schedules at the union level. Therefore, we generate an individual
member country’s bilateral tariff data using the CU tariff data. Preferential tariffs are also not
always reported for all years for many countries, so we use the data from the closest previous
available year as a substitute for an unavailable year.
For given pair, product and year, if data for both MFN and preferential tariffs are available,
the latter is lower with only a few exceptions. So we use the lower rate between the two that is19The tariff data are standard for all countries only up to the HS6-digit level. For more disaggregated levels,
countries do not always use the same codes to define products.
27
available.20
All data are converted to HS2007 6 digit using UN correspondence tables. Trade and tariff data
is from 2002 to 2010, so all data from 2002-2006 are converted into HS2007 product code. In Ta-
ble 3, Rauch (1999)’s SITC Revision 2 4-digit data on product homogeneity/heterogeneity are also
converted into HS2007 product code. In Table 4, in order to classify the skill requirement level, we
use Basu (2011)’s classifications. He classified the products in to 7 categories: (A) non-fuel pri-
mary commodities, (B) resource-intensive manufactures, (C) low skill- and technology-intensive
manufactures, (D) medium skill- and technology intensive manufactures, (E) high skill- and tech-
nology intensive manufactures, (F) mineral fuels, and (G) unclassified products. For presentation
purpose, we combine (A) and (F) and classify them “Minerals”, and combine (B) and (C) and
dropped (G).
20Alternatively, we could use preferential tariffs whenever they are available, otherwise use MFN tariffs. Thedifference it makes to the tariff variable after averaging over all products is negligible.
28
B The equivalence between LSDV method and the within esti-
mator for balanced data
For the sake of simplicity of illustration, we consider 3-way error component model instead of
4-way error component model. Given that both 3-way and 4-way error component models have
multiple error component, both require multiple with transformations so that the procedure is ba-
sically the same. A extension to 4-way error component model is straightforward. The following
equation is considered for an illustration of transformation:
yi jt = Xi jta +uit + v jt +wi j + ei jt
where i = 1,2, ...n and j = 1,2, ...m are country identifiers and t = 1,2, ...T the time identifier.
In matrix format, we can rewrite:
y = Xa +u+v+Dww + e (19)
where y=(y111,y112, ...,y11T ,y121, ...,y12T ,y131...,y1m1, ...,y1mT ,y211, ...,y21T ,y221, ..,ynm1, ..,ynmT )
is a m⇥n⇥T by 1 vector. In matrix format, we stack over t first, then over j and, finally, over i.
Qw = IT �Dw(D0wDw)
�1D0w
Qw =
0
BBBBBBB@
IT
IT
. . .
IT
1
CCCCCCCA
�
0
BBBBBBB@
iT
iT. . .
iT
1
CCCCCCCA
·
29
2
66666664
0
BBBBBBB@
i0T
i0T. . .
i0T
1
CCCCCCCA
0
BBBBBBB@
iT
iT. . .
iT
1
CCCCCCCA
3
77777775
�10
BBBBBBB@
i0T
i0T. . .
i0T
1
CCCCCCCA
Qw =
0
BBBBBBBBBB@
IT
IT
. . .. . .
IT
1
CCCCCCCCCCA
�
0
BBBBBBBBBB@
1T iT i0T
1T iT i0T
. . .. . .
1T iT i0T
1
CCCCCCCCCCA
Qw =
0
BBBBBBBBBB@
IT � 1T iT i0T
IT � 1T iT i0T
. . .. . .
IT � 1T iT i0T
1
CCCCCCCCCCA
where iT is a T ⇥1 column vector of 1, Qw is a m⇥n⇥T by m⇥n⇥T matrix and Dw is a m⇥n⇥T
by m⇥n matrix.
Therefore, we obtain:
Qwy =
0
BBBBBBBBBB@
IT � 1T iT i0T
IT � 1T iT i0T
. . .. . .
IT � 1T iT i0T
1
CCCCCCCCCCA
0
BBBBBBBBBBBBBBBBB@
y11
y12...
y1m
y21...
ynm
1
CCCCCCCCCCCCCCCCCA
where y11 = (y111,y112, ...,y11T )0 and, similarly, ynm = (ynm1,ynm2, ...,ymnT )0.30
Qwy =
0
BBBBBBBBBBBBBBBBB@
y111 � y11 ⌘ y⇤111
y112 � y11 ⌘ y⇤112...
y1mT � y1m ⌘ y⇤1mT
y211 � y21 ⌘ y⇤211...
ynmT � ymn ⌘ y⇤nmT
1
CCCCCCCCCCCCCCCCCA
where y11 =1T ÂT
t=1 y11t , y1m = 1T ÂT
t=1 y1mt , and ynm = 1T ÂT
t=1 ynmt . Therefore, pre-multiplying by
Qw demeans variables using the mean over t.
We can rewrite (19), after pre-multiplying it by Qw, as follows:
y⇤i jt = X⇤i jta +u⇤it + v⇤jt + e⇤i jt
In matrix format, we can rewrite:
y⇤ = X⇤a +Duu⇤+v⇤+ e⇤ (20)
where y⇤=(y⇤111,y⇤121, ...,y
⇤1m1,y
⇤112, ...,y
⇤1m2,y
⇤113...,y
⇤1m3, ...,y
⇤1mT ,y
⇤211, ...,y
⇤2m1,y
⇤212, ..,y
⇤n1T , ..,y
⇤nmT )
is a m⇥n⇥T by 1 vector. In matrix format, we stack over j first, then over t and, finally, over i.
Qu = Im �Du(D0uDu)
�1D0u
Qu =
0
BBBBBBB@
Im
Im
. . .
Im
1
CCCCCCCA
�
0
BBBBBBB@
im
im. . .
im
1
CCCCCCCA
·
31
2
66666664
0
BBBBBBB@
i0m
i0m. . .
i0m
1
CCCCCCCA
0
BBBBBBB@
im
im. . .
im
1
CCCCCCCA
3
77777775
�10
BBBBBBB@
i0m
i0m. . .
i0m
1
CCCCCCCA
Qu =
0
BBBBBBBBBB@
Im
Im
. . .. . .
Im
1
CCCCCCCCCCA
�
0
BBBBBBBBBB@
1m imi0m
1m imi0m
. . .. . .
1m imi0m
1
CCCCCCCCCCA
where im is a m⇥1 column vector of 1, Qu is a m⇥n⇥T by m⇥n⇥T matrix and Du is a m⇥n⇥T
by n⇥T matrix.
Therefore, we obtain:
Quy⇤ =
0
BBBBBBBBBB@
Im � 1m imi0m
Im � 1m imi0m
. . .. . .
Im � 1m imi0m
1
CCCCCCCCCCA
0
BBBBBBBBBBBBBBBBB@
y⇤11
y⇤12...
y⇤1T
y⇤21...
y⇤nT
1
CCCCCCCCCCCCCCCCCA
where y⇤11 = (y⇤111,y⇤121, ...,y
⇤1m1)
0 and, similarly, y⇤nT = (y⇤n1T ,y⇤n2T , ...,y
⇤nmT )
0.
32
Quy⇤ =
0
BBBBBBBBBBBBBBBBB@
y⇤111 � y⇤11 ⌘ y#111
y⇤121 � y⇤11 ⌘ y#121
...
y⇤1mT � y⇤1T ⌘ y#1mT
y⇤211 � y⇤21 ⌘ y#211
...
y⇤nmT � y⇤nT ⌘ y#nmT
1
CCCCCCCCCCCCCCCCCA
where y⇤11 = 1m Âm
j=1 y⇤1 j1, y⇤1T = 1m Âm
j=1 y⇤1 jT , and y⇤nT = 1m Âm
j=1 yn jT . Therefore, pre-multiplying
by Qu demeans variables using the mean over j. We can rewrite (20), after pre-multiplying it by
Qu, as follows:
y#i jt = X#
i jta + v#jt + e#
i jt
In matrix format, we can rewrite:
y# = X#a +Dvv# + e# (21)
We stack over i first, then over t and, finally, over j.
Qv = In �Dv(D0vDv)
�1D0v
Qv =
0
BBBBBBB@
In
In
. . .
In
1
CCCCCCCA
�
0
BBBBBBB@
1n ini0n
1n ini0n
. . .
1n ini0n
1
CCCCCCCA
where in is a n⇥1 column vector of 1, Qv is a m⇥n⇥T by m⇥n⇥T matrix and Dv is a m⇥n⇥T
by m⇥T matrix.
Therefore, we obtain:
33
Qvy# =
0
BBBBBBBBBB@
In � 1n ini0n
In � 1n ini0n
. . .. . .
In � 1n ini0n
1
CCCCCCCCCCA
0
BBBBBBBBBBBBBBBBB@
y#11
y#12...
y#1T
y#21...
y#mT
1
CCCCCCCCCCCCCCCCCA
where y#11 = (y#
111,y#211, ...,y
#n11)
0 and, similarly, y#mT = (y#
1mT ,y#2mT , ...,y
#nmT )
0.
Qvy# =
0
BBBBBBBBBBBBBBBBB@
y#111 � y#
11 ⌘ ey111
y#211 � y#
11 ⌘ ey211...
y#n1T � y#
1T ⌘ eyn1T
y#121 � y#
21 ⌘ ey121...
y#nmT � y#
mT ⌘ eynmT
1
CCCCCCCCCCCCCCCCCA
Therefore, pre-multiplying by Qv demeans variables using the mean over i. We can rewrite
(21), after pre-multiplying it by Qv as follows:
eyi jt = eXi jta +eei jt (22)
In conclusion, we establish that sequential least square dummy variables approach on (19), (20)
and (21) is equivalent to the within estimator with our proposed transformations of variables as in
(22).
34
C Within transformation for unbalanced data
For the sake of simplicity of illustration, we consider 3-way error component model instead of 4-
way error component model. Similar to in the Appendix B, both 3-way and 4-way error component
models require multiple within transformations to remove multiple unobserved heterogeneity fac-
tors so that the procedure of within transformations is basically the same. Consider the following
3-way error components model.
yi jt = Zi jtb +wi j +uit + v jt + ei jt
As shown in section 4.3.2, the first order demeaning can remove one unobserved heterogeneity
completely. On the other hand, the second and third order demeaning can only partially eliminate
unobserved heterogeneity. Incomplete elimination of uit and v jt occurs because non-first-order
summations over i, j, t provide distinct values according to the order of summation when there are
missing observations in the i and j dimensions. Equations (23), (24), and (25) provide demeaned
values for unobserved heterogeneity for wi j, uit and v jt , respectively, after the t � i� j within
transformation.
ewi j = si jtwi j �1
Ti jST
t=1si jtwi j �1
NitSN
j=1si jtwi j �1
NjtSN
i=1si jtwi j +1
NitSN
j=11
Ti jST
t=1si jtwi j
+1
NjtSN
i=11
Ti jST
t=1si jtwi j +1
NitSN
j=11
NjtSN
i=1si jtwi j �1
NitSN
j=11
NjtSN
i=11
Ti jST
t=1si jtwi j(23)
= si jwi j � si jwi j �1Ni
SNj=1si jwi j �
1Nj
SNi=1si jwi j +
1Ni
SNj=1si jwi j
+1
NjSN
i=1si jwi j +1Ni
SNj=1
1Nj
SNi=1si jwi j �
1Ni
SNj=1
1Nj
SNi=1si jwi j
= 0
where, for the second equality in (23), we use 1Ti j
STt=1si jtwi j =
1Ti j
Ti j · si jtwi j, and for i j-variant
variables si jt , N and T do not depend on t (i.e. si jt = si j,Nit = Ni, Njt = Nj).
35
ev jt = si jtv jt �1
Ti jST
t=1si jtv jt �1
NitSN
j=1si jtv jt �1
NjtSN
i=1si jtv jt +1
NitSN
j=11
Ti jST
t=1si jtv jt
+1
NjtSN
i=11
Ti jST
t=1si jtv jt +1
NitSN
j=11
NjtSN
i=1si jtv jt �1
NitSN
j=11
NjtSN
i=11
Ti jST
t=1si jtv jt
= si jtv jt �1
Ti jST
t=1si jtv jt �1
NitSN
j=1si jtv jt � si jtv jt +1
NitSN
j=11
Ti jST
t=1si jtv jt
+1
NjtSN
i=11
Ti jST
t=1si jtv jt +1
NitSN
j=1si jtv jt �1
NitSN
j=11
NjtSN
i=11
Ti jST
t=1si jtv jt (24)
= (� 1Tj
STt=1s jtv jt +
1Nt
SNj=1
1Tj
STt=1s jtv jt)� (� 1
NjtSN
i=11Tj
STt=1s jtv jt +
1Nt
SNj=1
1Njt
SNi=1
1Tj
STt=1s jtv jt)
where, for the second equality in (24), we make use of the facts that 1Njt
SNi=1si jtv jt =
1Njt
Njt ·si jtv jt =
si jtv jt , and that for jt-variant variables si jt , N and T do not depend on i (i.e. si jt = s jt , Nit = Nt ,
Ti j = Tj). In the last line in (24), the first term is not equal to zero if there are missing observations
in the j dimension and the second term is not equal to zero if there are missing observations in the
i and j dimensions.
euit = si jtuit �1
Ti jST
t=1si jtuit �1
NitSN
j=1si jtuit �1
NjtSN
i=1si jtuit +1
NitSN
j=11
Ti jST
t=1si jtuit
+1
NjtSN
i=11
Ti jST
t=1si jtuit +1
NitSN
j=11
NjtSN
i=1si jtuit �1
NitSN
j=11
NjtSN
i=11
Ti jST
t=1si jtuit
= si jtuit �1
Ti jST
t=1si jtuit � si jtuit �1
NjtSN
i=1si jtuit +1
NitSN
j=11
Ti jST
t=1si jtuit
+1
NjtSN
i=11
Ti jST
t=1si jtuit +1
NitSN
j=11
NjtSN
i=1si jtuit �1
NitSN
j=11
NjtSN
i=11
Ti jST
t=1si jtuit (25)
= (� 1Ti
STt=1situit +
1Nt
SNi=1
1Ti
STt=1situit)+(� 1
NtSN
i=1situit +1
NitSN
j=11Nt
SNi=1situit)
+(1
NitSN
j=11Ti
STt=1situit �
1Nit
SNj=1
1Nt
SNi=1
1Ti
STt=1situit)
= (� 1Ti
STt=1situit +
1Nt
SNi=1
1Ti
STt=1situit)� (� 1
NitSN
j=11Ti
STt=1situit +
1Nit
SNj=1
1Nt
SNi=1
1Ti
STt=1situit)
where, for the second equality in (25), we make use of the facts that 1Nit
SNj=1si jtuit =
1Nit
Nit ·si jtuit =
si jtuit , and that for it-variant variables si jt , N and T do not depend on j (i.e. si jt = sit , Njt = Nt ,36
Ti j = Ti). In the last line in (25), the first term is not equal to zero if there are missing observations
in the i dimension and the second term is not equal to zero if there are missing observations in the
i and j dimensions.
D Monte Carlo Simulation
For 3WEC model, simulation results for the within estimators are reported in Kwak et al. (2012).
To be added for 4WEC model.
References
Abrevaya, J., 2013. The projection approach for unbalanced panel data. The Econometrics Journal
16, 161–178.
Alessandria, G., Choi, H., Ruhl, K., 2014. Trade adjustment dynamics and the welfare gains from
trade. NBER Working Paper Series 20663.
Anderson, J. E., 1979. A theoretical foundation for the gravity equation. American Economic
Review 69 (1), 106–116.
Anderson, J. E., van Wincoop, E., 2003. Gravity with gravitas: a solution to the border puzzle.
American Economic Review 93 (1), 170–192.
Anderson, J. E., van Wincoop, E., 2004. Trade costs. Journal of Economic Literature 42 (3), 691–
751.
Antweiler, W., 2001. Nested random efects estimation in unbalanced panel data. Journal of Econo-
metrics 101, 295–313.
Arkolakis, C., Costinot, A., Rodriguez-Clare, A., February 2012. New trade models, same old
gains? American Economic Review 102 (1), 94–130.
37
Baier, S. L., Bergstrand, J. H., 2001. The growth of world trade: tariffs, transport costs, and income
similarity. Journal of International Economics 53 (1), 1–27.
Baier, S. L., Bergstrand, J. H., 2007. Do free trade agreements actually increase members’ interna-
tional trade? Journal of International Economics 71 (1), 72–95.
Baier, S. L., Bergstrand, J. H., Feng, M., 2014. Economic integration agreements and the margins
of international trade. Journal of International Economics 93 (2), 339–350.
Baltagi, B. H., Song, S., Jung, B. C., 2001. The unbalanced nested error component regression
model. Journal of Econometrics 101, 357–381.
Basu, S., 2011. Retooling trade policy in developing countries: Does technology intensity of ex-
ports matter for gdp per capita. Policy Issues in International Trade and Commodities 56.
Bergstrand, J. H., Egger, P., Larch, M., 2013. Gravity redux: Estimation of gravity-equation coeffi-
cients, elasticities of substitution, and general equilibrium comparative statics under asymmetric
bilateral trade costs. Journal of International Economics 89 (1), 110–121.
Broda, C., Weinstein, D. E., May 2006. Globalization and the gains from variety. The Quarterly
Journal of Economics 121 (2), 541–585.
Caliendo, L., Parro, F., 2014. Estimates of the trade and welfare effects of nafta. The Review of
Economic Studies, rdu035.
Chaney, T., 2008. Distorted gravity: The intensive and extensive margins of international trade.
American Economic Review 98 (4), 1707–21.
Davis, P., 2002. Estimating multi-way error components models with unbalanced data structures.
Journal of Econometrics 106, 67–95.
Dutt, P., Mihov, I., Van Zandt, T., 2013. Does wto matter for the extensive and the intensive margins
of trade? Journal of International Economics forthcoming.
38
Eaton, J., Kortum, S., 2002. Consistent estimation from partially consistent observations. Econo-
metrica 70, 1741–1779.
Harrigan, J., 1993. Oecd imports and trade barriers in 1983. Journal of International Economics
35 (1-2), 91–111.
Helpman, E., Melitz, M., Rubinstein, Y., 2008. Estimating trade flows: Trading partners and trad-
ing volumes. The Quarterly Journal of Economics 123 (2), 441–487.
Hummels, D., 2001. Toward a geography of trade costs. Tech. rep., Mimeo, Purdue University.
Koenig, P., Mayneris, F., Poncet, S., 2010. Local export spillovers in france. The European Eco-
nomic Review 54 (4), 622–641.
Krautheim, S., 2012. Heterogeneous firms, exporter networks and the effect of distance on inter-
national trade. Journal of International Economics 87 (1), 27–35.
Kwak, D. W., Cheong, J., Tang, K. K., 2012. A within estimator for three-level data: An application
to the wto effect on trade flows. mimeo, School of Economics Discussion Paper No. 501, 2012.
Limao, N., Tovar, P., 2011. Policy choice: Theory and evidence from commitment via international
trade agreements. Journal of International Economics 85 (2), 186–205.
Magee, C. S., 2008. New measures of trade creation and trade diversion. Journal of International
Economics 75 (2), 349–362.
Maoz, Z., Henderson, E. A., 2013. The world religion dataset, 1945-2010: Logic, estimates, and
trends. International Interactions 39 (3), 265–291.
Matyas, L., Balazsi, L., 2012. The estimation of multi-dimensional fixed effects panel data models.
mimeo.
Melitz, M. J., November 2003. The impact of trade on intra-industry reallocations and aggregate
industry productivity. Econometrica 71 (6), 1695–1725.39
Melitz, M. J., Ottaviano, G. I., 2008. Market size, trade, and productivity. The review of economic
studies 75 (1), 295–316.
Rauch, J. E., 1999. Networks versus markets in international trade. Journal of International Eco-
nomics 48 (1), 7–35.
Ray, E. J., 1981. The determinants of tariff and nontariff trade restrictions in the united states.
Journal of Political Economy 89 (1), 105–21.
Wansbeek, T., Kapteyn, A., 1989. Estimation of the error-components model with incomplete
panels. Journal of Econometrics 41, 341–361.
Wooldridge, J., 2009. Correlated random effects models with unbalanced panels. mimeo.
40