estimating trade elasticities using product-level data · illustration, we provide some examples of...

Estimating Trade Elasticities Using Product-Level Data

(Very preliminary, Please do not cite)

Juyoung Cheong⇤, Do Won Kwak†, and Kam Ki Tang ‡

Abstract

This paper estimates trade cost (tariff) elasticities using bilateral tariff data at the HS6-digit

level for 47 countries from 2002 to 2010. It extends the Helpman et al. (2008) model to incor-

porate firms’ fixed costs of exporting that vary at the pair-product level. It uses a multi-way

error component model framework to account for high-dimensional unobserved factors at the

most disaggregate level (5,310 products) including firm heterogeneity and product level mul-

tilateral resistance. Furthermore, a novel use of the within estimator is introduced because

conventional within estimator in a multi-way error component model is inconsistent with un-

balanced data. The empirical results show that there is substantial upward bias in the estimates

of trade cost elasticities in the literature. Properly accounting for zero trade flows and firm

heterogeneity using more disaggregated data provide significantly smaller estimates of trade

cost elasticities, which imply much larger welfare gains from trade.

JEL Code: C23, F14

Keywords: Trade Cost Elasticity, Tariff, Unobserved Firm Heterogeneity, Within Estimator

⇤School of Economics, University of Queensland, QLD 4072, Australia; e-mail: [email protected]†School of Economics, University of Queensland,QLD 4072, Australia; e-mail: [email protected]‡School of Economics, University of Queensland, QLD 4072, Australia; e-mail: [email protected]

1

1 Introduction

Trade elasticities are key parameters to quantify trade costs and welfare gains from trade. Arkolakis

et al. (2012) show that for a range of trade models, gains from trade can be consistently estimated

using two variables, the import-penetration ratio (or domestic share) and the trade elasticity with

respect to variable trade cost obtained from a gravity equation.1 The import-penetration ratio data

are regularly compiled by national statistical agencies and readily available across countries. On

the contrary, the trade elasticity is not directly observable and, thus, needs to be estimated. The

precise estimation of trade elasticities is difficult however, because of data limitations. In this

paper, we approximate the trade elasticities using the responsiveness of imports to tariffs changes.

We focus on identifying trade elasticities from the changes in tariffs because tariffs suffer less from

measurement errors than other variable trade cost elements like transportation cost.2

Many studies have attempted to empirically estimate trade elasticities (e.g. Anderson (1979),

Harrigan (1993), Hummels (2001), Baier and Bergstrand (2001), Broda and Weinstein (2006) and

Bergstrand et al. (2013)). Our estimates for the trade cost elasticities are most comparable to those

from Broda and Weinstein (2006) for the elasticities of substitution, which is typically interpreted

as the trade cost elasticities. Broda and Weinstein (2006) acknowledge that they must make sim-

plifying assumptions on the demand and supply structure to empirically implement their method;

as a result, their model does not directly address factors such as firm heterogeneity and other un-

observed determinants of firms’ export market entry decisions. Our estimates are transparent in

controlling for those factors.

Furthermore, unlike this study most of these studies estimate the elasticities using either prod-1Arkolakis et al. (2012) note that the perfect competition models in Anderson and van Wincoop (2003) and Eaton

and Kortum (2002) and the monopolistic competition models in Melitz (2003) are all in a broad class of models sharingDixit–Stiglitz preferences, one factor of production, linear cost functions, complete specialization, and iceberg tradecosts. They show that if three “macro-level” restrictions hold, then all four models will share a common estimator ofthe gains from trade — which depends only on the import-penetration ratio and a gravity-equation-based estimate ofthe “trade-cost” elasticity of trade flows (of which sv is a measure of in many models). (Bergstrand et al. (2013), p.111)

2We assume a tariff to be a form of variable trade costs, and changes of tariff will have the identical effect on tradeflows as equal changes in any other sources of variable costs such as transportation costs due to fungibility of capital.Given that transportation cost and other costs are typically measured with serious errors while tariffs are measuredmore precisely in relative sense, focusing trade costs changes form tariff changes is beneficial, especially when theyare measured at very disaggregated product level.

2

uct or firm-level data for a single exporting country or country-level data for multiple country-pairs

(abbreviated as “pairs” hereafter) due to the limited availability of disaggregate data. On the one

hand, using data for a single country cannot account for unobserved heterogeneity at the importer

level and that could potentially cause severe bias. On the other hand, using aggregate data is even

more problematic because two potentially serious biases could arise in the estimations of trade

elasticities. Firstly, at any given time, tariffs vary substantially across products for each country,

but data aggregation omits the information on substitution between products, causing severe bias

in the trade elasticity estimation.3 Secondly, unfiltered bilateral trade data typically features a large

proportion of zero observations but aggregation can hide these zero flows which has special im-

plication. Information on zero trade flows at the product level is naturally lost with aggregate data

and firm’s self-selection of exporting at the product level could not be addressed as a result. Previ-

ous studies also point out problems associated with zeros observations. For instance, AvW (2004)

and Broda and Weinstein (2006) illustrate that ignoring zero trade flows in estimating trade cost

elasticities in the gravity equation using aggregate data can cause upward bias.

Helpman et al. (2008) (denoted as HMR hereafter) show that zero trade flows in aggregate data

are due to firms’ self-selection out of foreign/export markets, and suggest a two-stage procedure to

account for the bias caused by their self-selection. However, their method is difficult to implement

with product-level data, because the exclusion restriction at their first stage estimation require a

variable that could determine firms’ export market entry decision but could not affect the volume

of exports.4

In this paper, we propose a method to estimate trade cost (tariff) elasticities using product

level data that can avoid biases from various sources; pile-ups at zero trade flows, heterogeneous

pair-product fixed costs of exporting, and product-level multilateral resistance terms (MRTs). For

highly disaggregated product-level bilateral trade data, it is crucial to have an estimator that could3Anderson and van Wincoop (2004) (denoted as AvW (2004) hereafter) demonstrate with a numerical example

that “the elasticity of substitution at the more aggregated level is entirely irrelevant.” (p. 727). They advise that: “oneshould choose elasticities at a sufficiently disaggregated level at which firms truly compete.”

4Recent theoretical studies like Chaney (2008) and Krautheim (2012) pay attention to the role of fixed costs inheterogeneous firms’ entry decision in new export markets, which is empirically supported by Koenig et al. (2010).

3

deal with the source of zero observations in unbalanced data. This is because product-level bi-

lateral trade data could have more than half of the observations being zero, as countries typically

trade only specific products with individual partners. We extend the idea from the HMR approach

in controlling for firms’ fixed costs of exporting to a product level model using various within

transformations for unbalanced data in multi-way error components models. Our proposed esti-

mator is attractive because first, it allows us to account for various unobserved factors without the

requirement for any exclusion restrictions and second, unlike the conventional within estimator

it is consistent even when being directly applied to multiple error components with high dimen-

sional indexes (e.g. importer-product-year, exporter-product-year, importer-exporter-product) for

unbalanced data (Wansbeek and Kapteyn (1989); Baltagi et al. (2001); Davis (2002)). We alge-

braically show that our proposed within estimator has following main properties with unbalanced

data. Firstly, it can remove only one error component completely without causing any inconsis-

tency. Second, the order of demeaning matters for the magnitude of inconsistency in the sense that

only the first ordered demeaning could remove one error component completely, while the second

and further ordered demeanings could remove error components partially and cause inconsistency.

Third, using partially balanced data we could obtain certain within estimators that could remove

more than one error component completely (partially balanced data is defined as the data are bal-

anced in only two or three dimensions out of four) data. Therefore, the within estimators with

partially balanced data could avoid these biases from unbalanced data as well as computational

difficulty in dealing with high-dimensional unobserved factors. Our estimator directly accounts

for unobserved product-level heterogeneity using multiple high-dimensional dimensional fixed ef-

fects.

This paper contributes to the literature in the following ways. Firstly, we construct a new bi-

lateral tariff dataset for 47 countries and about 5,310 products at the HS6-digit level, from 2002 to

2010 as they are not readily available. Few studies have used product level tariff data for multiple

countries to estimate trade cost elasticity.5 For the case of trade flows, about 70% of the observa-5Exceptions are Hummels (2001) which uses year 1992 data of sectoral imports of six countries and Caliendo and

Parro (2014) which estimate trade elasticities for 20 sectors using year 1993 data for 30 countries.

4

tions, however, are of a zero value, meaning that many pairs trade only a limited range of products.

Disaggregated product level trade data (e.g, SITC 4 or 5 digits, HS6-digit) for multiple countries’

trading partners are publicly available from various data sources such as WITS and UNComtrade

but typically only positive flows are used in the estimation.6

Secondly, factors important for firms’ foreign market entry decision, including pair-product

level fixed costs and pair-year specific fixed costs, are explicitly addressed following HMR and

Chaney (2008). In our dataset, tariffs and trade flows vary across four dimensions, namely: im-

porting country, exporting country, product, and year. This allows us to directly control for (i)

unobserved demand factors such as consumer tastes using importer-product-year fixed effects, (ii)

unobserved supply factors such as productivity changes using exporter-product-year fixed effects,

(iii) unobserved pair-product-specific fixed costs of exporting (such as information costs) using

importer-exporter-product fixed effects, and (iv) unobserved pair-time-specific fixed costs (such

as political alliance) using importer-exporter-year fixed effects. To our knowledge, our empirical

strategy is the most comprehensive in terms of accounting for these four distinct unobserved fac-

tors that are important to the estimation of trade elasticities. Our proposed novel use of the within

estimator with partially balanced data could also avoid inconsistency and computational difficul-

ties associated with using the conventional within estimator and using a large number of dummy

variables for high-dimensional unobserved factors respectively. As a result, we can perform re-

gression analyses with product-level data while avoiding omitted variable bias (OVB) caused by

product-level MRTs and unobserved fixed costs which arise from information barriers, interest-

group lobbying, and government red-tape, to name just a few.7

Our empirical results show that ignoring zero observations and any one of the demand factors,

supply factors, product-level MRTs or fixed costs at the disaggregated level could overestimate

the trade elasticities (i.e. the absolute values of the estimates are smaller than their ’true values’).6See Broda and Weinstein (2006), Baier et al. (2014), and Dutt et al. (2013) for examples.7Anderson and Yotov (2011 and 2012) show importance of product level MRTs to accounting for the demand and

supply factors. The HMR and Chaney (2008) show the importance of controlling for fixed costs that is pair-product andpair-time varying unobserved factors which are related to fixed costs due to bureaucratic corruption and inefficiencyof trade that is not specific to a sector.

5

It implies that welfare gains from trade are much larger when measured with the trade cost elas-

ticity estimated using product-level data including zero observations with proper controls. As an

illustration, we provide some examples of welfare effects using trade costs elasticity and import-

penetration ratio measures following Arkolakis et al. (2012) and compare our results and those

from previous studies. Furthermore, our results imply that substantial heterogeneity exist in the

elasticity estimates across products. We examine elasticity heterogeneity using different market

structures of products and input quality in skill levels.

The rest of the paper is organized as follows. Section 2 describes the HMR model to introduce

an estimable gravity equation at the product level data using multi-way error components model.

Section 3 describes the data. In section 4, we explain our empirical model framework and provide

the estimation results. The last section provides concluding remarks and policy implications. Sim-

ulation and sensitivity analysis for our proposed estimator results are provided in the Appendix.

2 Gravity Model

2.1 Gravity Model at Sectoral Level

In this section, the gravity model is derived for sectoral (or product) level estimations. We adopt

the HMR model but extend to multiple sectors, k (k = 1,2, ...K). For aggregate data, K = 1; for

HS 2-digit data, K = 98; and for HS 6-digit data, K ⇡ 5,400. As in Chaney (2008) and Arkolakis

et al. (2012), we further assume that consumer’s utility function is the Cobb-Douglas for products

and Dixit-Stiglitz for varieties within a product.

A representative consumer in country j ( j = 1,2, ...J) has the following utility function at time

t:

Ujt =K

’k=1

ˆh2Hjkt

x jkt(h)(sk�1)/skdh

!qksk/(sk�1)

(1)

where x jkt(h) is a differentiated variety h of product k available in j at time t, Hi jt is the set of

6

varieties available for product k in j’s country at time t, skis the elasticity of substitution between

varieties of product k , which is assumed to vary across products with sk>1 and qk is the share of

income Yjt on goods from sector k with ÂKk qk = 1.

Although consumption can change over time as Yjt changes, for simplicity we assume no bor-

rowing or lending so that the consumer is maximizing consumption period by period instead of

in an intertemporal manner. There is also no dynamic in the production side unlike Alessandria

et al. (2014). Then, with country j’s income at time t, Yjtwhich equals the expenditure at time t we

obtain country j’s demand for variety h of product k at time t from (1):

x jkt(h) =p⇤jkt(h)

�skE jkt

P1�skjkt

(2)

where p⇤jkt(h) is the price of variety h of product k in country j at time t, E jkt = qkYjt is the

expenditure by j on goods from sector k from all origins at time t, and Pjkt is the price index

associated with varieties form sector k in j at time t where

Pjkt =

"ˆh2Hjkt

p⇤jkt(h)1�skdh

#1/(1�sk)

(3)

Using firms’ standard markup pricing equation in the monopolistic competition model and

following HMR (2008), we essentially have the same outcome for i0s imports of product k from j

as HMR (2008) for each product k in each time t.

Ti jkt =

✓c jktti jkt

akPit

◆1�sk

EiktNjktVi jkt (4)

where Ti jkt is the import of product k by country i from country j at time t, Njkt is the number of all

firms in sector k in j at time t, Vi jkt is a monotone function of the fraction of j’s firms that export

to i in sector k (which is based on firms’ relative productivity and fixed export costs to i) at time t.

c jkt is a measure of average productivity of firms in sector k in j at time , but different firms have

7

different productivity based on akt , ti jkt is the variable trade cost in sector k exported from j to i,

and (1�sk) is the trade elasticity with respect to variable trade cost for product k.

As in HMR, we also have

(Pikt)1�sk =

J

Âj=1

✓c jktti jkt

ak

◆1�sk

NjktVi jkt (5)

Vi jkt =

ˆ ai jkt

aLkt

(akt)1�sk dG(akt)

=gag�sk+1

Lt

(g �sk +1)(agHkt

�agLkt)Wi jkt (6)

Wi jkt = max

(✓ai jkt

aLkt

◆g�sk+1�1,0

)(7)

In the baseline specification, HMR assume that firm productivity 1/akt is truncated Pareto

distributed to the support [aLkt,aHkt ] so that G(akt) = (agkt � ag

Lkt)/(agHkt � ag

Lkt),g > (sk � 1). Pikt

is corresponding to i’s MRTs in product k at time t as introduced by Anderson and van Wincoop

(2003), but it additionally takes into account the impact of firm heterogeneity with the fixed costs

of exporting on prices in product k.

The model can be solved for the following estimable equation:

lnTi jkt = (sk �1)lnak � (sk �1)lnc jkt + lnNjkt +(sk �1)lnPikt

+lnEikt � (sk �1)lnti jkt + lnVi jkt (8)

Pikt , and Njkt and c jkt are unobserved, but they can be accounted for by using exporter-sector-

specific factors and importer-sector-specific factors. Thus, with Wi jkt, being a monotonic function

8

of Vi jkt , we can obtain the following estimable equation:

lnTi jtk =� (sk �1)lnti jkt + lnWi jkt +u1ikt + v1 jkt (9)

where u1ikt =(sk�1)lnPikt +lnEikt +(sk�1)lna , and v1 jkt = lnNjkt �(sk�1)lnc jkt +ln(gag�sk+1Lt )�

ln(g �sk +1)� ln(agHkt

�agLkt).

Here we depart from HMR and further assume that lnWi jkt could be decomposed into a number

of additively separable components:

lnWi jkt = w2i jk +h2i jt +u2ikt + v2 jkt + e2i jkt (10)

where w2i jk represents time-invariant pair-sector specific fixed costs, h2i jt is pair-time specific

random shock to fixed costs, u2ikt is importer-sector-time specific random shock to fixed costs,

v2 jkt is exporter-sector-time specific random shock to fixed costs, and e2i jkt is all remaining random

elements that could affect fixed costs of firm’s export market entry decision. These five factors

determine the fraction of firms that export from j to i in sector k at time t.

Likewise, we further assume that non-tariff-related variable trade costs can be decomposed into

five additively separable factors such that:

lnti jkt = ln(1+ tari f fi jkt)+w3i jk +h3i jt +u3ikt + v3 jkt + e3i jkt (11)

where w3i jk +h3i jt +u3ikt + v3 jkt + e3i jkt is the linear approximation of non-tariff-related variable

trade costs.

Substituting (10) and (11) into (4), we obtain the following equation:

lnTi jtk = a0 � (sk �1) · ln(1+ tari f fi jkt)+wi jk +hi jt +uikt + v jkt + ei jkt (12)

where wi jk ⌘ w2i jk � (sk � 1)w3i jk, hi jt ⌘ h2i jt � (s � 1)h3i jt , uikt ⌘ u1ikt + u2ikt � (sk � 1)u3ikt ,

v jkt ⌘ v1 jkt + v2 jkt � (sk � 1)v3 jkt , and ei jkt ⌘ e2i jkt � (sk � 1)e3i jkt +(s �sk)h3i jt with s is the

9

average of sk. As wi jk, hi jt , uikt , and v jkt are unobservable, we model them using fixed effects.

Here (sk �1) can be consistently estimated if we can assume

Cov(ln(1+ tari f fi jkt),ei jkt |wi jk,hi jt ,uikt ,v jkt) = 0 (13)

That is, we assume that once we have controlled for all i jt-, i jk-, ikt-, and jkt-varying factors,

the remaining trade costs factors that could vary in all four dimensions of i, j,k and t simultaneously

(i.e. ei jkt) are uncorrelated with tari f fi jkt . One possible scenario in which this assumption is

violated is that, when a country lowers its tariffs on certain imports from a specific source country

as a results of, for instance, trade agreement negotiations, it might provide assistance to the affected

industries or erect non-tariff barriers on those products as a compensation (Ray, 1981; Limao and

Tovar, 2011). However, as long as the assistance or non-tariff barriers are industry specific but not

industry and source country specific, they would be controlled for by uikt .

Also, the consistency condition (13) is not conditional on Ti jkt > 0 unlike the HMR method.

In order to directly incorporate zero trade flows in the estimation, we replace lnTi jkt in (9) with

ln(Ti jkt +1).

2.2 Self-Selection and Firm Heterogeneity

A key insight of HMR is that zero trade flows commonly observed in bilateral trade data are re-

sults of firms’ decision of self-selection into (or out of) foreign market and the decision is not

uniform across country due to firm heterogeneity.8 HMR suggest a two-stage procedure to account

for self-selection and firm heterogeneity for aggregate data. The first-stage estimation requires an

exclusion restriction variable that affects firms’ entrance into a foreign market but not their trade

volume.9 In order to extend the HMR approach to estimate product level models, the first stage8As we use product level data, we assume that firms’ self-selection of exporting occurs at pair-product level for the

rest of paper.9HMR first consider bilateral regulation measure as the exclusion restriction variable. Besides strong assumption

of excludability in the model for trade volume, a practical limitation of this exclusion restriction is data availability,which prompts HMR to consider an alternative variable of an index of common religion (between any pair). Data oncommon religion has been available only for sporadic periods until Maoz and Henderson (2013) construct a world

10

estimation requires an exclusion restriction variable that affects firms’ entrance into individual

product markets in a foreign country but not their trade volumes in each of those markets. How-

ever, finding an exclusion restriction that varies over pair-product, importer-product-time, and/or

exporter-product-time is extremely difficult due to data limitation.

To avoid data problems, we propose an estimator that allows us to by-pass the first-stage esti-

mation in the HMR’s two-stage procedure while still controlling for self-selection and firm hetero-

geneity. We assume that unobserved factors associated with self-selection and firm heterogeneity

could be approximated by four additively separable factors, wi jk +hi jt + uikt + v jkt , then we can

write an estimable equation using a multi-way error components model. Furthermore, it could

be estimated by the within estimator that treat unobserved factors wi jk +hi jt + uikt + v jkt as fixed

effects.

Therefore, when using disaggregate panel data, our estimator is superior to the two-stage HMR

method in several ways. First, it avoids data limitations as mentioned above. Second, fixed

effects are free from measurement errors and validity assumption associated with exclusion re-

striction (e.g. the religion variable). Next, besides pair-time unobserved factors associated with

self-selection and firm heterogeneity, it allows us to additionally control for pair-product, importer-

product-time, and exporter-product-time unobserved factors associated with self-selection and firm

heterogeneity using fixed effects..

Furthermore, those multi-dimensional fixed effects account for other unobserved factors as

well. For instance, importer (exporter)-product-time fixed effects additionally control for product-

time varying MRTs and exporter’s supply (importer’s demand), which are crucial to obtain unbi-

ased estimates, but again very difficult to collect at fairly dissagregate product level.

religion dataset for every five years from 1945 to 2010. While not to depreciate the value of this improved dataset onreligion, we should be cautious about the measurement errors due to the challenging nature in measuring them at thefirst place.

11

3 Data

We construct two sets of data. The first data set is HS 6-digit import data from UN COMTRADE

for 47 countries from 2002 to 2010 for about 5,310 products. The second data set is bilateral tariff

data from TRAINS of UNCTAD for the same sample. More detailed explanations of the data are

provided in the Appendix.

For each pair of country and any product with available tariffs, unreported entries of trade flows

are replaced with zero. In our data only positive observations are reported. Because every country

in our sample imports from other sample countries for all sample periods in aggregate level, we

presume that unreported entries at product level are zeros.

Conceptually, balanced data mean that for all pairs of countries (i j), products (k) and time (t),

observations are available for both the dependent variable (y) and all explanatory variables (X).

In practice, bilateral trade data can never be truly balanced. This is because, data is ’naturally’

unbalanced because countries do not conduct international trade with themselves so that Tii is not

defined for all i. Furthermore, there are some cases where we do not observe a country’s tariff rate

for a specific product for entire sample years. For example, in the cases of Argentina, no tariffs are

reported for about 20 out of 5,310 products. We leave them as missing to be truthful to the data.10

Finally, some product codes are removed, combined with others, or newly introduced and these

products could induce missing data over time. We define partially balanced data to deal with this

type of missing data.

Besides balanced data, we also consider unbalanced and partially balanced data (i.e. only part

of dimension is balanced among four dimensions) for the comparison purpose in evaluating our

estimator and conventional within estimator. The use of the unbalanced data in most of previous

studies is inevitable as they use only positive trade flows, i.e. zero trade flows are not imputed in10These two types of missing data together make up only less than 3% of all data. Because the proportion of

missing is very small, we regard the data as (almost) balanced in practical sense if there are not other sources ofmissing data. Extensive simulation experiments are performed with various data generating processes to examine theeffect of missing data for our proposed within estimators. Simulation results show that there is no size bias of theestimators. Results can be found in Kwak et al. (2012) for three-way error components model and in the appendix forfour-way error components model .

12

estimations. The degree of unbalancedness depends on the level of data aggregation, increasing up

to about 70% for HS6-digit data in our sample. We show that the size bias of conventional within

estimator increases with missing proportion in unbalanced data so that more care is required in

using within estimator with disaggregate data to avoid bias.

4 Empirical Framework

4.1 A Gravity Equation with 4-way Error Components

Our estimation equation is obtained from (12) in Section 2. :

ln(Ti jkt +1) = a0 +a · ln(1+ tari f fi jkt)+wi jk +hi jt +uikt + v jkt + ei jkt (14)

where ln(Ti jkt + 1) is log trade flows, tari f fi jkt is a tariff that importing country i imposes on

product k from exporting country j at time t. wi jk, hi jt , uikt and v jkt are pair-product, pair-

time, importer-product-time, and exporter-product-time heterogeneity, respectively and ei jkt is

an idiosyncratic error term. Our primary object of interests is to obtain a consistent estimate of

a = (1�s), the elasticity of trade costs. This specification implies that, in contrast to the the-

oretical model in section 2, here we assume the trade elasticity, 1�sk, is homogeneous across

all products so that 1�sk = 1�s . We relax this assumption later to adherent to the theoretical

model.

To control for unobserved factors, we adopt fixed effects following standard approach in the

studies.. However, unlike in 3-way error components (3WEC) model followed by most of studies

in the literature (e.g., Baier and Bergstrand (2007); Magee (2008); Baier et al. (2014) ) using fixed

effects in the 4-way error components (4WEC) model may not be trivial depending on degree

of balancedness of data. Using the dummy variable method is computationally challenging and

infeasible in most statistical package while within estimator could be inconsistent unless data are

13

balanced.11 In this sub-section, we discuss within estimators for different degrees of balancedness

of data and suggest a method to adopt in our estimations.

4.1.1 Within Estimator for Balanced Data

As provided in the Appendix, with balanced data and conditional exogeneity assumption on ei jkt ,

the conventional LSDV (Least Squared Dummy Variables) estimator to eq.(??) is numerically

equivalent to the pooled Least Squared (LS) estimator (See Wooldridge (2009); Abrevaya (2013)

for two dimensional panel data cases) in the following equation with the within transformed vari-

ables:

yi jkt = Xi jkta + ei jkt (15)

where multiple transformations that remove all four unobserved factors is defined as follows:

yi jkt = yi jkt � yi jk·� yi j·t � yi·kt � y· jkt

+yi j··+ yi·k·+ yi··t + y· jk·+ y· j·t + y··kt (16)

�yi···� y· j··� y··k·� y···t + y····

In eq.(16) a dot in the subscript denotes the averaging dimension. For instance, yi jk·=1

Ti jkÂT

t=1 si jktyi jkt ,

where si jt is an indicator for observability of both yi jkt and Xi jkt (i.e. si jkt = 1 if both yi jkt

and Xi jkt are observed, and zero otherwise), and Ti jk = ÂTt=1 si jkt , so that Ti jk = T for balanced

11Multi-way error components model are typically estimated using dummy variables (e.g. Antweiler (2001); Baltagiet al. (2001); Davis (2002) among others) or within transformations of variables to remove all or some of these errorcomponents (e.g. Matyas and Balazsi (2012) and Kwak et al. (2012)). However, an extension of these methods toestimate product level data are very difficult. Firstly, the dummy variables method confronts computation difficultiesarising from the large size of the data set. Antweiler (2001); Baltagi et al. (2001); Davis (2002) propose estimatorsthat can be applied to unnested and unbalanced bilateral trade data; in practice, however, implementation of theirmethod requires to specify the transformation matrix and to generate too many dummy variables. As a result, we arenot able to find any empirical study of gravity models implementing this type of estimators. On the other hand, thewithin-transformation based estimators generally are biased with unbalanced panel data although they do not sufferfrom computational difficulty.

14

data but Ti jk < T for unbalanced data.12 We denote averaging over other dimension as fol-

lows: yi·kt =1

ni jtÂn

j=1 si jtyi jt , nikt = Ânj=1 si jkt , y· jkt =

1m jkt

Âmi=1 si jktyi jkt and m jkt = Âm

i=1 si jkt , yi·k·

= 1Tik

ÂTt=1

1nikt

Ânj=1 si jktyi jkt . We can similarly define the rest of averaging terms in eq.(16).

The four error components, wi jk, hi jt , uikt , and v jkt , can be removed by demeaning over the

dimensions of t, k, j, and i, respectively. Our estimation strategy is to remove all four error compo-

nents by using four distinct within transformations. Four successive demeanings need to be applied

and these give yi jkt as the dependent variable in eq.(16). Also note that the order of demeaning in

t, k, j, and i does not matter for balanced data but it matters for the bias size in unbalanced data.

The transformations of X and e are denoted as Xi jt and ei jt , respectively, could be similarly de-

fined as in (16). We call eq.(16) the four-way within transformation as it removes all 4WECs. The

four-way within estimator is consistent only when the data is balanced and conditional exogeneity

assumption on ei jkt is satisfied; for unbalanced data it is inconsistent.13

Our sample includes 47 largest countries countries in terms of GDP in 2007 but excludes some

countries such as India that do not have tariffs data at least for one partner country. Therefore,

for each country at a given year, both tariff data and trade flows data imputed with zeros are

available for HS 6-digit products for all partner countries. Therefore, tariff data are balanced for

pair and product but not year. This is because each year variety of products vary over some years.

For instance, in 2007 reclassification occurs for some HS 6-digit products. In the process, some

product codes are removed, combined with others, or newly introduced. Therefore, data are not

balanced in strict sense. We name this partially balanced data and provide further discussion on

this in section 4.2.3.12Using selection indicator notation is very powerful to show our algebraic results as it manifests the sources of bias

for the conventional within estimators in unbalanced data.13We assume conditional exogeneity of idiosyncratic error terms throughout the paper. For the within estimator

in (15), non-zero terms remain after the within transformation. Moreover, these terms are not mean-zero and do notgo to zero as the sample size increases. These terms are all zero if si jkt = 1 for all observations as in the case ofbalanced data, but once si jkt becomes random non-zero terms, additional terms remain after transformation and causeinconsistency. See Matyas and Balazsi (2012); Kwak et al. (2012) for the detailed discussion of within transformationfor 3WEC models and Monte Carlo evidence.

15

4.1.2 Within Estimator for Unbalanced Data

In general, the within estimator that removes multiple error components in unbalanced data is

inconsistent. Bias arises when eq.(16) is applied to unbalanced data and its size depends upon

which of error components are the source of bias and the order of demeaning. Unlike the case

of balanced data, the values of the last eleven terms in eq.(16) (i.e. all terms that are averaging

two or more dimensions yi j··, yi·k·, yi··t , y· jk·, ...y...) now depend on the order of averaging. For these

eleven terms, we use superscripts to denote the order of averaging over various dimensions. For

instance, y jti·k· is obtained by averaging over the j-dimension first and the t-dimension second, i.e.

y jti·k· =

1Tik

ÂTt=1

1nikt

Ânj=1 si jktyi jkt , while yt j

i·k· is obtained the reversed order of averaging over the

two dimensions, i.e. yt ji·k· =

1nik

Ânj=1

1Ti jk

ÂTt=1 si jktyi jkt . The mean values depend on the order of

averaging, y jti·k· 6= yt j

i·k· unless by coincidence. This is because si jkt depends on all i, j, k, and t.

As a result, 1Tik

ÂTt=1

1nikt

Ânj=1 si jkt and 1

niktÂn

j=11

TikÂT

t=1 si jkt become distinctively different values.

Similarly, the mean values of the other eleven terms in eq.(16) depend upon the order of averaging.

Finally, for the y··· (i.e. yi jkt , yi jtk, yit jk, yitk j, yikt j, y jk jt , ......), there are 24 distinct sequences of four

dimensional averaging and their values are different each. For a 4WEC model, there are a total of

24 different orders of averaging and thus 24 different transformations based on eq.(16).

For any unbalanced data, we use the property of the within estimator that one of the error

components in eq.(??) can be completely removed by the within transformation without causing

any bias. For our purpose, the important implication is that t-dimensional demeaning, if performed

very first, can remove the unobserved factor wi jk:

si jktwi jk �1

Ti jk

T

Ât=1

si jktwi jk = si jktwi jk �wi jk1

Ti jk

T

Ât=1

si jkt = si jktwi jk �wi jk1

Ti jkTi jk = si jktwi jk �wi jk

where first equality hold because wi jk does not depend on t. Thus, for observed value, si jkt = 1,

si jktwi jk �1

Ti jk

T

Ât=1

si jktwi jk = si jktwi jk �wi jk = 0 (17)

16

Similarly, one error component can be removed by the within transformation such that k-

dimensional demeaning, i-dimensional demeaning, and j-dimensional demeaning, if performed

very first, can remove hi jt , uikt , and v jkt , respectively. Obviously, since only one of these four

demeanings can be performed first, three unobserved factors cannot be completely removed and,

thus, potentially become the sources of bias in the estimations.

Furthermore, less bias is retained if it is an earlier demeaning dimension that takes out the error

component that causes inconsistency.14 For instance, suppose that hi jt is the only unobserved

factor correlated with tari f fi jkt in equation (??). Let us consider demeaning over product and

four classes of demeaning orders: k · ··, ·k · ·, · · k·, · · ·k. Here k · ·· indicates demeaning over the

k dimension first (e.g. ki jt, k jti, kt ji etc.), ·k · · demeaning over the k dimension second, and so

forth. Within estimators of each of these demeaning order have different magnitudes of bias due to

omitted variable hi jt . We have already proven that the class k · ·· can completely remove OVB due

to hi jt . Simulation results in the Appendix show that absolute magnitude of OVB due to hi jt for

other classes increases as the demeaning order for k increases, i.e. k · ·· will cause no bias, ·k · · will

cause the second smallest bias and · · ·k the largest bias. A corollary of these findings is that, within

transformations can remove one error component completely and for the other error components

only partially. However, compared to the estimates from pooled OLS, the absolute magnitude of

the bias from each source is reduced from the within transformations.15

4.1.3 Within Estimator for Partially Balanced Data

Our sample is balanced in pair and product dimension but unbalanced in time dimension. In this

case, we can still design within transformations that could take out multiple error components

completely. By investigating the sources of biases and how biases arise from each demeaning of

within transformation, we could obtain a consistent within estimator using partial balanced of data,

si jkt = st . Our main insight to obtain a consistent estimator is that we remove wi jk as the first order

14The Appendix C provides some algebraic results.15Matyas and Balazsi (2012) suggest a solution is to remove one error component using within transformation, and

then model the remaining error components using dummy variables. However, in the case of using pair-product-levelpanel data, there are still way too many dummy variables to be feasibly estimate even using their method.

17

t-dimensional demeaning (note that t is not balanced dimension) and remove uikt , and v jkt as j and

i dimensional demeanings (note that data are balanced at pair level).

Let us consider transformations that remove wi jk, uikt , andv jkt completely. The demean order-

ing we use as an illustration is i- j-t. First, first order demeaning over dimension i can remove v jkt

as shown in section 4.2.2. Now, we consider uikt with double demeanings over dimensions i and j.

stuikt �1N

N

Âi=1

stuikt �1N

N

Âj=1

(stuikt �1N

N

Âi=1

stuikt) = stuikt � st1N

N

Âi=1

uikt � stuikt1N

N

Âj=1

+st1N

N

Âj=1

1 · 1N

N

Âi=1

uikt

= stuikt � st1N

N

Âi=1

uikt � stuikt + st1N

N

Âi=1

uikt ·1N

N

Âj=1

1

= stuikt � st1N

N

Âi=1

uikt � stuikt + st1N

N

Âi=1

uikt = 0

Finally, we consider wi jk with triple demeanings over dimensions i, j, and t.

stwi jk�1N

N

Âi=1

stwi jk�1N

N

Âj=1

(stwi jk�1N

N

Âi=1

stwi jk)�1T

T

Ât=1

[stwi jk�1N

N

Âi=1

stwi jk�1N

N

Âj=1

(stwi jk�1N

N

Âi=1

stwi jk)]

= stwi jk � st1N

N

Âi=1

wi jk � st1N

N

Âj=1

wi jk + st1N

N

Âj=1

1N

N

Âi=1

wi jk

�wi jk1T

T

Ât=1

st + st1N

N

Âi=1

wi jk +1T

T

Ât=1

st1N

N

Âj=1

wi jk �1T

T

Ât=1

st1N

N

Âj=1

1N

N

Âi=1

wi jk

= stwi jk � st1N

N

Âi=1

wi jk � st1N

N

Âj=1

wi jk + st1N

N

Âj=1

1N

N

Âi=1

wi jk

�wi jk + st1N

N

Âi=1

wi jk +1N

N

Âj=1

wi jk �1N

N

Âj=1

1N

N

Âi=1

wi jk

Therefore, for st = 1,we can rewrite the last equation as follows:

18

wi jk�1N

N

Âi=1

wi jk�1N

N

Âj=1

wi jk+1N

N

Âj=1

1N

N

Âi=1

wi jk�wi jk+1N

N

Âi=1

wi jk+1N

N

Âj=1

wi jk�1N

N

Âj=1

1N

N

Âi=1

wi jk = 0

and we obtain zero because all terms are canceled out each other.

In sum, using st = 1 and triple demeanings over i- j-t (or it could be shown for demeanings over

j-i-t), we algebraically show that within transformations can remove wi jk, uikt , andv jkt completely.

Thus, for all our estimations in results section, we obtain elasticity estimates from using within

transformations over i- j-t and dummy variables for hi jt unless indicated otherwise. Note that

number of dummy variables for hi jt is 20⇥19⇥9 = 3,420 and no computational difficulty arises

with less than 4,000 explanatory variables. We provide extensive simulation results for within

estimators with triple transformations and partially balanced data, st = si jkt in the Appendix.

4.2 Estimation

4.2.1 Trade Elasticities with Disaggregate data

Table 1 shows various estimation results of using HS6-digit disaggregate data. In column (1), we

only use non-zero observation while in columns (2)-(5) zero observations are included, and various

columns differ in term of the controls of unobserved factors.

At the HS6-digit level of disaggregation, about 67% of trade flow observations are zeros. The

inclusion of such large number of zero observations dramatically reduces the trade elasticity es-

timate from -3.6 down to -0.7. The reason for this is that almost all country pairs trade only a

limited number of HS6-digit level products at any time. For instance, countries that do not have

oil reserves cannot export crude oil, and developing countries do not have skilled labour to export

high-technology products. When zero trade flows are excluded from the estimation, the trade elas-

ticity estimate is determined entirely by the response of the intensive product margin of trade to the

reduction in tariffs. According to column (1), a 1% reduction in tariffs is expected to increase the

flows of existing trade link between two countries by 3.6%. When zero trade flows are included,

19

the estimate is determined by the response of both the intensive and the extensive product margins

of trade. Even if some firms are responsive to reduction in variable trade costs and start to export to

new market, the initial volume is likely to be small compared to the export volume to established

markets. As a result, a 1% reduction in tariff only increase the average trade flow per product

by a much smaller extent of 0.7%. However, it should be noted that column (2) has not properly

accounted for the product-specific unobserved heterogeneity including product-specific fixed costs

and MRTs, and therefore suffers from omitted variable bias.

As explained in Section 2, we assume these unobserved factors can be collectively decomposed

into hi jt , uikt , v jkt and wi jk. We start with controlling for i jt-varying unobserved factors in column

(3), which reduces the trade elasticity estimate marginally to -0.68. In column (4) we further

control for ikt- and jkt-varying unobserved factors. The elasticity estimate reduces from -0.68 to

-0.57. In column (5) we have the full set of controls for unobserved heterogeneity at the product

level; the additional control for i jk-varying factors further reduces the elasticity estimate to -0.43.

A comparison of columns (2) to (5) shows that the progressively more comprehensive control

of unobserved heterogeneity reduces the estimation bias. In particular, the absolute value of the

elasticity estimate is reduced by nearly 40% from -0.71 in column (2) to -0.43 in column (5). This

indicates that OVB is as sizable as aggregation bias. But bias from omitting zero observations is

by far the largest amongst the three sources of bias.

Arkolakis et al. (2012) show that for a range of trade models, including the Armington model

and new trade models with micro-foundation like Eaton and Kortum (2002) and Melitz and Otta-

viano (2008), the welfare gains from trade (compared to autarky) can be simply measured using

two statistics, the share of expenditure on domestic goods, l and the elasticity of imports with re-

spect to variable trade cost, f . Using their welfare change formula to evaluate the welfare change

in US’s year 2000 compared to Autarky, which is (1�l�1/f ) where l = 0.93, they illustrate that

the percentage change in real income needed to compensate a representative consumer for going

back to autarky is 0.7 percent to 1.4 percent depending if the trade elasticity are ranged from -5

to -10 as surveyed in Anderson and van Wincoop (2004). If we use the estimate obtained from

20

Table 1: Trade elasticities: HS 6-digit

(1) (2) (3) (4) (5)Disaggregate: HS6

Elasticity (1-s ) -3.606*** -0.705*** -0.681*** -0.571**** -0.427***(0.354) (0.123) (0.125) (0.268) (0.101)

ij, it, jt Yes Yes Yes Yes Yesijt No No Yes Yes Yes

ikt, jkt No No No Yes Yesijk No No No No Yes

Model log-linearNo. of obs. 6,016,280 18,158,824

Notes: Cluster (pair) robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the10%, 5%, 1% levels, respectively. We used balanced data and the within estimator.

non-zero observations as in column (1), US’s gains from trade in year 2000 is implied to be 2

percent, slightly higher shown in Arkolakis et al. (2012) still seems to be small. When we use the

estimate obtained from data including zero observations with proper product-level controls as in

column (5), the implied US’s gains from trade in year 2000 increase to 16 percent. This simple

numerical example serves the purpose that, to evaluate the welfare impact of trade liberalization,

it is paramount to have an unbiased estimate of trade elasticity.

4.2.2 The Extensive Margin

Given that the elasticity estimate is substantially smaller when the extensive product margin of

trade is added to the intensive product margin of trade, it will be interesting to know that how the

extensive product margin of trade itself responses to change in variable trade costs.

We use the following model to estimate the effect on the probability of starting import.

I(Ti jkt > 0) = F(a0 +a · tari f fi jkt +wi jk +hi jt +uikt + v jkt) (18)

21

Table 2: Probability of starting new product trade

(1) (2) (3)Disaggregate: HS6

Probability -0.011 -0.068*** -0.038***(0.009) (0.023) (0.011)

ijt Yes Yes Yesikt, jkt No Yes Yes

ijk No No YesModel linear probability model

No. of obs. 18,158,824

Notes: Cluster (pair) robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the10%, 5%, 1% levels, respectively. We used balanced data and the within estimator.

where I is an indicator and F(·) is normal CDF. We estimate (18) using linear probability model

that allows to directly control four multi-way error components. Table 2 shows the probability of a

country starting to export a new product to a partner country in response to a tariff cut by its partner

for that product. Given this is a probability model, the tariff enters the model in level terms instead

of in logarithmic terms. The estimated probability varies depending on what type of unobserved

heterogeneity being accounted for. In column (3) where the control for unobserved heterogeneity

is most comprehensive, it is found that on average there is a close to 3.8% chance that a country

will start importing a new product from a partner country if the latter cuts the tariff on that product

by 1 percentage point. At least in a one-year horizon, the response of the extensive trade margin to

changes in variable trade cost is rather small.

4.2.3 Products from Different Market Structures

In this section, we decompose the products into three groups using the classifications in Rauch

(1999) and examine whether goods traded on organized exchanges have larger trade elasticity.

Rauch (1999) classifies goods into three groups: commodities, reference priced goods and differ-

entiated goods. Commodities are those goods traded on organized exchanges. Reference priced

goods are the goods that reference prices of them being published in trade journals, and the rest are22

differentiated goods.16 In our sample, 5.1%, 30.7% and 64.2% of the observations are commodi-

ties, reference priced and differentiated goods, respectively.

Broda and Weinstein (2006) use a model of import demand and supply equations and find the

commodities have higher elasticity of substitution than the other goods. However, as they point

out, commodities do not necessarily imply perfectly substitutable goods. They said, “For example,

although tea is classified by Rauch as a commodity, it is surely quite differentiated. Similarly, it

is hard to see why a commodity like “dried, salted, or smoked fish” would be more homogeneous

than a reference priced good like “fresh fish” or a differentiated good like “frozen fish.”

Table 4 shows the results.17 In column (1), commodity goods have the largest trade elasticity,

followed by referenced priced goods, and then differentiated goods. This is consistent with the

expectation in Rauch (1999) and finding in Broda and Weinstein (2006). However, it can be noticed

that the trade elasticity of differentiated goods is of an unexpected positive sign. This could be

due to fact that we have not fully accounted for all product-level unobserved heterogeneity and

therefore the estimate is biased.

When we progressively control for more unobserved factors, surprisingly the relative trade

elasticities of the three types of goods reverse. In column (3), differentiate goods now have the

largest elasticity and of a correct sign, while commodities goods have the smallest and insignificant

one. The reverse of relative magnitude can be explained as the following. The world markets

for commodities are more integrated (via organized market trading for instance). As a result,

changes in tariffs on a product between a pair has greater effects on the prices of that product

elsewhere, and the feedback (or general equilibrium) effect on the trade between the pair will

also be stronger. In the Anderson and van Wincoop (2003) framework, the relative prices of

goods in the rest of the world for a pair is denoted as MRTs. In our modeling setting, product-

level MRTs are captured by ikt and jkt fixed effects. Therefore, once we have controlled for ikt-

and jkt-unobserved heterogeneity, we eliminate this feedback effect. Because the feedback effect

could be larger for goods sold in more organized market, its removal may reverse the relative

16See Rauch (1999) for details.17We use the conservative definition from Rauch (1999). Using his liberal definition yields very similar results.

23

Table 3: Trade elasticities heterogeneity according to homogeneity of goods with unobserved het-erogeneity


Differentiated goods 0.483*** -0.512 -0.482***(0.169) (0.366) (0.143)

Reference priced goods -1.444*** -0.519* -0.300***(0.151) (0.281) (0.080)

Commodity goods -1.887*** 0.097 -0.041(0.217) (0.416) (0.183)


ijk No No YesModel log-linear

No. of obs. 15,835,089

Notes: Some sectors that were not categorized in these three groups are dropped in the estimations. Cluster (pair)robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the 10%, 5%, 1% levels,respectively. We used balanced data and the within estimator.

trade elasticities of the three types of products in column (3) as compared to column (1).

4.2.4 Products of Different Skill-Intensity

Firms’ markups depend on trade elasticity. Higher trade elasticity yields lower markups. Skill

intensity of products may indicate the firms’ market power, where greater market power sets higher

markups. Firms produce higher skill intensity products, for example, contract lenses, may have

more market power compared firms producing lower skill intensity products, for example toilet

papers.

In this section, we divide the products based on skill-intensity based on the classification of

Basu (2011).18 In our sample, 25.5%, 34.8%, 16.7% and 23% are minerals, resource and low-skill

intensive manufacturing (e.g., toilet paper), medium-skill intensive manufacturing (e.g., air con-

ditioner) and high-skill intensive manufacturing (e.g., microscopes), respectively. Table 5 shows

the results. In column (1), without proper controls at product level, the trade elasticity estimates18See Appendix for details of the classification.

24

Table 4: Trade elasticities heterogeneity according to skill-intensity of goods with unobservedheterogeneity


Minerals -1.379*** -0.214 -0.187**(0.158) (0.216) (0.078)

Resource & low-skill intensive manufacturing -0.475* 0.410 -0.474***(0.252) (0.425) (0.159)

Medium-skill intensive manufacturing 6.722*** -5.129*** -0.368**(0.458) (0.762) (0.186)

High-skill intensive manufacturing 1.692*** -0.571 -0.216(0.248) (0.654) (0.141)


ijk No No YesModel log-linear

No. of obs. 14,712,838

Notes: Some sectors that were not categorized in these four groups are dropped in the estimations. Cluster (pair)robust standard errors are reported in parentheses. *, **, *** denote statistical significance at the 10%, 5%, 1% levels,respectively. We used balanced data and the within estimator.

for medium and high-skill intensive manufacturing have wrong signs and statistically significant.

However, with controlling for all fixed effects in column (3), the estimates change dramatically

and turn to be of correct signs. Resource and low-skill intensive manufacturing products have the

highest elasticity and of a correct sign while high-skill intensive manufacturing products have the

lowest and insignificant one. It implies that if countries’ final goods imports are more weighted on

low and medium-skill intensive manufacturing goods, for the same trade share the welfare gains

from trade would be smaller.

5 Conclusion

Trade elasticity is a key variable in assessing the gains from trade. In particular, gains from trade are

shown to be a power function of import penetration ratio and trade elasticity that, a small decrease

in the absolute value of trade elasticity implies a large increase in gains from trade. Previous

25

empirical literature almost uniformly suggests that the trade elasticity is of an order way bigger

than unity, implying questionably small gains from trade In this paper, we show that previous

estimations suffer from various sources of bias: data aggregation, pile-ups at zero trade flows,

heterogeneous pair-product fixed costs of exporting, and product-level MRTs.

To account for these sources of bias, this paper estimated trade elasticity using an new dataset

on tariffs, a novel empirical model, and an innovative estimation method. The new dataset devel-

oped in this paper contains tariff and bilateral trade data for 47 countries from 2002 to 2010 at the

HS 6-digit product level, which covers over 5,300 products. This is the most disaggregate tariff

and bilateral trade data that are currently available for such a large number of countries and years.

This highly disaggregate dataset provides us a platform to tackle bias due to data aggregation on

the one hand, but exacerbates the problem of pile-ups at zero trade flows on the other because

each country pair tends to trade a limited range of HS6-digit level products. A proper treatment

of zero trade flows at the product level requires us to account for heterogeneous pair-product fixed

costs of exporting. We propose to deal with this problem as well as product-level MRTs using a

gravity model with 4-way error components. These error components control for observed and

unobserved heterogeneity at i jt, ikt, jkt and i jk dimensions using fixed effects. Estimating such as

4WEC models is not trivial, especially with unbalanced data, as being the case here. We therefore

suggest a novel application of the within estimator so that it is consistent even with unbalanced

data.

It is shown that, once these sources of bias are arrested, the trade elasticity estimate is reduced

by a fraction of one-tenth, implying much sizable welfare gains from trade. It is also found that

the response of the extensive product margin of trade to tariff cut is quite small at least in the short

run, leaving the intensive product margin of trade the main channel through which tariff changes

affect trade flows. Lastly, there is evidence that there are large variations in the trade elasticities

across sectors of different levels of market structures and skill-intensity.

26

A Data descriptions

For our analysis we need bilateral tariff data at product level. Bilateral tariff data at sectoral levels,

however, are not readily available and need to be newly constructed. We construct a dataset for

time-variant bilateral tariffs at the HS6-digit level for every pair in our sample as follows.19

First, we download each importing country’s HS6-digit simple average applied MFN tariffs

from the World Integrated Trade Solution (WITS). For countries engaged in a Customs Union

(CU), tariff schedules are available at the union level only. We therefore, use these CU tariff

data to generate country level MFN tariff data for the member countries of a CU. Also, for some

countries, MFN tariffs are not available for certain years. In those cases, we use the data from the

closest previous available year as a substitute. The MFN tariffs are unilateral instead of bilateral.

To obtain bilateral tariff data, for each year we apply an importing country’s MFN tariffs to all its

trading partner countries. This procedure is justifiable as our sample includes the WTO members

only.

Preferential tariffs are collected separately for applicable pairs for each year also at the HS6-

digit level. If preferential tariffs on certain products are imposed by an importing country are

applicable to multiple exporting countries simultaneously, data are observed at the group level

only. To identify which countries belong to a specific group, we use the Preference Beneficiaries

data from the Trade Analysis and Information System (TRAINS), and then generate preferential

tariff data for each pair. Like MFN tariffs, for the countries engaged in a CU, we only observe

the preferential external tariff schedules at the union level. Therefore, we generate an individual

member country’s bilateral tariff data using the CU tariff data. Preferential tariffs are also not

always reported for all years for many countries, so we use the data from the closest previous

available year as a substitute for an unavailable year.

For given pair, product and year, if data for both MFN and preferential tariffs are available,

the latter is lower with only a few exceptions. So we use the lower rate between the two that is19The tariff data are standard for all countries only up to the HS6-digit level. For more disaggregated levels,

countries do not always use the same codes to define products.

27

available.20

All data are converted to HS2007 6 digit using UN correspondence tables. Trade and tariff data

is from 2002 to 2010, so all data from 2002-2006 are converted into HS2007 product code. In Ta-

ble 3, Rauch (1999)’s SITC Revision 2 4-digit data on product homogeneity/heterogeneity are also

converted into HS2007 product code. In Table 4, in order to classify the skill requirement level, we

use Basu (2011)’s classifications. He classified the products in to 7 categories: (A) non-fuel pri-

mary commodities, (B) resource-intensive manufactures, (C) low skill- and technology-intensive

manufactures, (D) medium skill- and technology intensive manufactures, (E) high skill- and tech-

nology intensive manufactures, (F) mineral fuels, and (G) unclassified products. For presentation

purpose, we combine (A) and (F) and classify them “Minerals”, and combine (B) and (C) and

dropped (G).

20Alternatively, we could use preferential tariffs whenever they are available, otherwise use MFN tariffs. Thedifference it makes to the tariff variable after averaging over all products is negligible.

28

B The equivalence between LSDV method and the within esti-

mator for balanced data

For the sake of simplicity of illustration, we consider 3-way error component model instead of

4-way error component model. Given that both 3-way and 4-way error component models have

multiple error component, both require multiple with transformations so that the procedure is ba-

sically the same. A extension to 4-way error component model is straightforward. The following

equation is considered for an illustration of transformation:

yi jt = Xi jta +uit + v jt +wi j + ei jt

where i = 1,2, ...n and j = 1,2, ...m are country identifiers and t = 1,2, ...T the time identifier.

In matrix format, we can rewrite:

y = Xa +u+v+Dww + e (19)

where y=(y111,y112, ...,y11T ,y121, ...,y12T ,y131...,y1m1, ...,y1mT ,y211, ...,y21T ,y221, ..,ynm1, ..,ynmT )

is a m⇥n⇥T by 1 vector. In matrix format, we stack over t first, then over j and, finally, over i.

Qw = IT �Dw(D0wDw)

�1D0w

Qw =

0

BBBBBBB@

IT

IT

. . .

IT

1

CCCCCCCA

�

0

BBBBBBB@

iT

iT. . .

iT

1

CCCCCCCA

·

29

2

66666664

0

BBBBBBB@

i0T

i0T. . .

i0T

1

CCCCCCCA

0

BBBBBBB@

iT

iT. . .

iT

1

CCCCCCCA

3

77777775

�10

BBBBBBB@

i0T

i0T. . .

i0T

1

CCCCCCCA

Qw =

0

BBBBBBBBBB@

IT

IT

. . .. . .

IT

1

CCCCCCCCCCA

�

0

BBBBBBBBBB@

1T iT i0T

1T iT i0T

. . .. . .

1T iT i0T

1

CCCCCCCCCCA

Qw =

0

BBBBBBBBBB@

IT � 1T iT i0T

IT � 1T iT i0T

. . .. . .

IT � 1T iT i0T

1

CCCCCCCCCCA

where iT is a T ⇥1 column vector of 1, Qw is a m⇥n⇥T by m⇥n⇥T matrix and Dw is a m⇥n⇥T

by m⇥n matrix.

Therefore, we obtain:

Qwy =

0

BBBBBBBBBB@

IT � 1T iT i0T

IT � 1T iT i0T

. . .. . .

IT � 1T iT i0T

1

CCCCCCCCCCA

0

BBBBBBBBBBBBBBBBB@

y11

y12...

y1m

y21...

ynm

1

CCCCCCCCCCCCCCCCCA

where y11 = (y111,y112, ...,y11T )0 and, similarly, ynm = (ynm1,ynm2, ...,ymnT )0.30

Qwy =

0

BBBBBBBBBBBBBBBBB@

y111 � y11 ⌘ y⇤111

y112 � y11 ⌘ y⇤112...

y1mT � y1m ⌘ y⇤1mT

y211 � y21 ⌘ y⇤211...

ynmT � ymn ⌘ y⇤nmT

1

CCCCCCCCCCCCCCCCCA

where y11 =1T ÂT

t=1 y11t , y1m = 1T ÂT

t=1 y1mt , and ynm = 1T ÂT

t=1 ynmt . Therefore, pre-multiplying by

Qw demeans variables using the mean over t.

We can rewrite (19), after pre-multiplying it by Qw, as follows:

y⇤i jt = X⇤i jta +u⇤it + v⇤jt + e⇤i jt


y⇤ = X⇤a +Duu⇤+v⇤+ e⇤ (20)

where y⇤=(y⇤111,y⇤121, ...,y

⇤1m1,y

⇤112, ...,y

⇤1m2,y

⇤113...,y

⇤1m3, ...,y

⇤1mT ,y

⇤211, ...,y

⇤2m1,y

⇤212, ..,y

⇤n1T , ..,y

⇤nmT )

is a m⇥n⇥T by 1 vector. In matrix format, we stack over j first, then over t and, finally, over i.

Qu = Im �Du(D0uDu)

�1D0u

Qu =

0

BBBBBBB@

Im

Im

. . .

Im

1

CCCCCCCA

�

0

BBBBBBB@

im

im. . .

im

1

CCCCCCCA

·

31

2

66666664

0

BBBBBBB@

i0m

i0m. . .

i0m

1

CCCCCCCA

0

BBBBBBB@

im

im. . .

im

1

CCCCCCCA

3

77777775

�10

BBBBBBB@

i0m

i0m. . .

i0m

1

CCCCCCCA

Qu =

0

BBBBBBBBBB@

Im

Im

. . .. . .

Im

1

CCCCCCCCCCA

�

0

BBBBBBBBBB@

1m imi0m

1m imi0m

. . .. . .

1m imi0m

1

CCCCCCCCCCA

where im is a m⇥1 column vector of 1, Qu is a m⇥n⇥T by m⇥n⇥T matrix and Du is a m⇥n⇥T

by n⇥T matrix.


Quy⇤ =

0

BBBBBBBBBB@

Im � 1m imi0m

Im � 1m imi0m

. . .. . .

Im � 1m imi0m

1

CCCCCCCCCCA

0

BBBBBBBBBBBBBBBBB@

y⇤11

y⇤12...

y⇤1T

y⇤21...

y⇤nT

1

CCCCCCCCCCCCCCCCCA

where y⇤11 = (y⇤111,y⇤121, ...,y

⇤1m1)

0 and, similarly, y⇤nT = (y⇤n1T ,y⇤n2T , ...,y

⇤nmT )

0.

32

Quy⇤ =

0

BBBBBBBBBBBBBBBBB@

y⇤111 � y⇤11 ⌘ y#111

y⇤121 � y⇤11 ⌘ y#121

...

y⇤1mT � y⇤1T ⌘ y#1mT

y⇤211 � y⇤21 ⌘ y#211

...

y⇤nmT � y⇤nT ⌘ y#nmT

1

CCCCCCCCCCCCCCCCCA

where y⇤11 = 1m Âm

j=1 y⇤1 j1, y⇤1T = 1m Âm

j=1 y⇤1 jT , and y⇤nT = 1m Âm

j=1 yn jT . Therefore, pre-multiplying

by Qu demeans variables using the mean over j. We can rewrite (20), after pre-multiplying it by

Qu, as follows:

y#i jt = X#

i jta + v#jt + e#

i jt


y# = X#a +Dvv# + e# (21)

We stack over i first, then over t and, finally, over j.

Qv = In �Dv(D0vDv)

�1D0v

Qv =

0

BBBBBBB@

In

In

. . .

In

1

CCCCCCCA

�

0

BBBBBBB@

1n ini0n

1n ini0n

. . .

1n ini0n

1

CCCCCCCA

where in is a n⇥1 column vector of 1, Qv is a m⇥n⇥T by m⇥n⇥T matrix and Dv is a m⇥n⇥T

by m⇥T matrix.


33

Qvy# =

0

BBBBBBBBBB@

In � 1n ini0n

In � 1n ini0n

. . .. . .

In � 1n ini0n

1

CCCCCCCCCCA

0

BBBBBBBBBBBBBBBBB@

y#11

y#12...

y#1T

y#21...

y#mT

1

CCCCCCCCCCCCCCCCCA

where y#11 = (y#

111,y#211, ...,y

#n11)

0 and, similarly, y#mT = (y#

1mT ,y#2mT , ...,y

#nmT )

0.

Qvy# =

0

BBBBBBBBBBBBBBBBB@

y#111 � y#

11 ⌘ ey111

y#211 � y#

11 ⌘ ey211...

y#n1T � y#

1T ⌘ eyn1T

y#121 � y#

21 ⌘ ey121...

y#nmT � y#

mT ⌘ eynmT

1

CCCCCCCCCCCCCCCCCA

Therefore, pre-multiplying by Qv demeans variables using the mean over i. We can rewrite

(21), after pre-multiplying it by Qv as follows:

eyi jt = eXi jta +eei jt (22)

In conclusion, we establish that sequential least square dummy variables approach on (19), (20)

and (21) is equivalent to the within estimator with our proposed transformations of variables as in

(22).

34

C Within transformation for unbalanced data

For the sake of simplicity of illustration, we consider 3-way error component model instead of 4-

way error component model. Similar to in the Appendix B, both 3-way and 4-way error component

models require multiple within transformations to remove multiple unobserved heterogeneity fac-

tors so that the procedure of within transformations is basically the same. Consider the following

3-way error components model.

yi jt = Zi jtb +wi j +uit + v jt + ei jt

As shown in section 4.3.2, the first order demeaning can remove one unobserved heterogeneity

completely. On the other hand, the second and third order demeaning can only partially eliminate

unobserved heterogeneity. Incomplete elimination of uit and v jt occurs because non-first-order

summations over i, j, t provide distinct values according to the order of summation when there are

missing observations in the i and j dimensions. Equations (23), (24), and (25) provide demeaned

values for unobserved heterogeneity for wi j, uit and v jt , respectively, after the t � i� j within

transformation.

ewi j = si jtwi j �1

Ti jST

t=1si jtwi j �1

NitSN

j=1si jtwi j �1

NjtSN

i=1si jtwi j +1

NitSN

j=11

Ti jST

t=1si jtwi j

+1

NjtSN

i=11

Ti jST

t=1si jtwi j +1

NitSN

j=11

NjtSN

i=1si jtwi j �1

NitSN

j=11

NjtSN

i=11

Ti jST

t=1si jtwi j(23)

= si jwi j � si jwi j �1Ni

SNj=1si jwi j �

1Nj

SNi=1si jwi j +

1Ni

SNj=1si jwi j

+1

NjSN

i=1si jwi j +1Ni

SNj=1

1Nj

SNi=1si jwi j �

1Ni

SNj=1

1Nj

SNi=1si jwi j

= 0

where, for the second equality in (23), we use 1Ti j

STt=1si jtwi j =

1Ti j

Ti j · si jtwi j, and for i j-variant

variables si jt , N and T do not depend on t (i.e. si jt = si j,Nit = Ni, Njt = Nj).

35

ev jt = si jtv jt �1

Ti jST

t=1si jtv jt �1

NitSN

j=1si jtv jt �1

NjtSN

i=1si jtv jt +1

NitSN

j=11

Ti jST

t=1si jtv jt

+1

NjtSN

i=11

Ti jST

t=1si jtv jt +1

NitSN

j=11

NjtSN

i=1si jtv jt �1

NitSN

j=11

NjtSN

i=11

Ti jST

t=1si jtv jt

= si jtv jt �1

Ti jST

t=1si jtv jt �1

NitSN

j=1si jtv jt � si jtv jt +1

NitSN

j=11

Ti jST

t=1si jtv jt

+1

NjtSN

i=11

Ti jST

t=1si jtv jt +1

NitSN

j=1si jtv jt �1

NitSN

j=11

NjtSN

i=11

Ti jST

t=1si jtv jt (24)

= (� 1Tj

STt=1s jtv jt +

1Nt

SNj=1

1Tj

STt=1s jtv jt)� (� 1

NjtSN

i=11Tj

STt=1s jtv jt +

1Nt

SNj=1

1Njt

SNi=1

1Tj

STt=1s jtv jt)

where, for the second equality in (24), we make use of the facts that 1Njt

SNi=1si jtv jt =

1Njt

Njt ·si jtv jt =

si jtv jt , and that for jt-variant variables si jt , N and T do not depend on i (i.e. si jt = s jt , Nit = Nt ,

Ti j = Tj). In the last line in (24), the first term is not equal to zero if there are missing observations

in the j dimension and the second term is not equal to zero if there are missing observations in the

i and j dimensions.

euit = si jtuit �1

Ti jST

t=1si jtuit �1

NitSN

j=1si jtuit �1

NjtSN

i=1si jtuit +1

NitSN

j=11

Ti jST

t=1si jtuit

+1

NjtSN

i=11

Ti jST

t=1si jtuit +1

NitSN

j=11

NjtSN

i=1si jtuit �1

NitSN

j=11

NjtSN

i=11

Ti jST

t=1si jtuit

= si jtuit �1

Ti jST

t=1si jtuit � si jtuit �1

NjtSN

i=1si jtuit +1

NitSN

j=11

Ti jST

t=1si jtuit

+1

NjtSN

i=11

Ti jST

t=1si jtuit +1

NitSN

j=11

NjtSN

i=1si jtuit �1

NitSN

j=11

NjtSN

i=11

Ti jST

t=1si jtuit (25)

= (� 1Ti

STt=1situit +

1Nt

SNi=1

1Ti

STt=1situit)+(� 1

NtSN

i=1situit +1

NitSN

j=11Nt

SNi=1situit)

+(1

NitSN

j=11Ti

STt=1situit �

1Nit

SNj=1

1Nt

SNi=1

1Ti

STt=1situit)

= (� 1Ti

STt=1situit +

1Nt

SNi=1

1Ti

STt=1situit)� (� 1

NitSN

j=11Ti

STt=1situit +

1Nit

SNj=1

1Nt

SNi=1

1Ti

STt=1situit)

where, for the second equality in (25), we make use of the facts that 1Nit

SNj=1si jtuit =

1Nit

Nit ·si jtuit =

si jtuit , and that for it-variant variables si jt , N and T do not depend on j (i.e. si jt = sit , Njt = Nt ,36

Ti j = Ti). In the last line in (25), the first term is not equal to zero if there are missing observations

in the i dimension and the second term is not equal to zero if there are missing observations in the

i and j dimensions.

D Monte Carlo Simulation

For 3WEC model, simulation results for the within estimators are reported in Kwak et al. (2012).

To be added for 4WEC model.

References

Abrevaya, J., 2013. The projection approach for unbalanced panel data. The Econometrics Journal

16, 161–178.

Alessandria, G., Choi, H., Ruhl, K., 2014. Trade adjustment dynamics and the welfare gains from

trade. NBER Working Paper Series 20663.

Anderson, J. E., 1979. A theoretical foundation for the gravity equation. American Economic

Review 69 (1), 106–116.

Anderson, J. E., van Wincoop, E., 2003. Gravity with gravitas: a solution to the border puzzle.

American Economic Review 93 (1), 170–192.

Anderson, J. E., van Wincoop, E., 2004. Trade costs. Journal of Economic Literature 42 (3), 691–

751.

Antweiler, W., 2001. Nested random efects estimation in unbalanced panel data. Journal of Econo-

metrics 101, 295–313.

Arkolakis, C., Costinot, A., Rodriguez-Clare, A., February 2012. New trade models, same old

gains? American Economic Review 102 (1), 94–130.

37

Baier, S. L., Bergstrand, J. H., 2001. The growth of world trade: tariffs, transport costs, and income

similarity. Journal of International Economics 53 (1), 1–27.

Baier, S. L., Bergstrand, J. H., 2007. Do free trade agreements actually increase members’ interna-

tional trade? Journal of International Economics 71 (1), 72–95.

Baier, S. L., Bergstrand, J. H., Feng, M., 2014. Economic integration agreements and the margins

of international trade. Journal of International Economics 93 (2), 339–350.

Baltagi, B. H., Song, S., Jung, B. C., 2001. The unbalanced nested error component regression

model. Journal of Econometrics 101, 357–381.

Basu, S., 2011. Retooling trade policy in developing countries: Does technology intensity of ex-

ports matter for gdp per capita. Policy Issues in International Trade and Commodities 56.

Bergstrand, J. H., Egger, P., Larch, M., 2013. Gravity redux: Estimation of gravity-equation coeffi-

cients, elasticities of substitution, and general equilibrium comparative statics under asymmetric

bilateral trade costs. Journal of International Economics 89 (1), 110–121.

Broda, C., Weinstein, D. E., May 2006. Globalization and the gains from variety. The Quarterly

Journal of Economics 121 (2), 541–585.

Caliendo, L., Parro, F., 2014. Estimates of the trade and welfare effects of nafta. The Review of

Economic Studies, rdu035.

Chaney, T., 2008. Distorted gravity: The intensive and extensive margins of international trade.

American Economic Review 98 (4), 1707–21.

Davis, P., 2002. Estimating multi-way error components models with unbalanced data structures.

Journal of Econometrics 106, 67–95.

Dutt, P., Mihov, I., Van Zandt, T., 2013. Does wto matter for the extensive and the intensive margins

of trade? Journal of International Economics forthcoming.

38

Eaton, J., Kortum, S., 2002. Consistent estimation from partially consistent observations. Econo-

metrica 70, 1741–1779.

Harrigan, J., 1993. Oecd imports and trade barriers in 1983. Journal of International Economics

35 (1-2), 91–111.

Helpman, E., Melitz, M., Rubinstein, Y., 2008. Estimating trade flows: Trading partners and trad-

ing volumes. The Quarterly Journal of Economics 123 (2), 441–487.

Hummels, D., 2001. Toward a geography of trade costs. Tech. rep., Mimeo, Purdue University.

Koenig, P., Mayneris, F., Poncet, S., 2010. Local export spillovers in france. The European Eco-

nomic Review 54 (4), 622–641.

Krautheim, S., 2012. Heterogeneous firms, exporter networks and the effect of distance on inter-

national trade. Journal of International Economics 87 (1), 27–35.

Kwak, D. W., Cheong, J., Tang, K. K., 2012. A within estimator for three-level data: An application

to the wto effect on trade flows. mimeo, School of Economics Discussion Paper No. 501, 2012.

Limao, N., Tovar, P., 2011. Policy choice: Theory and evidence from commitment via international

trade agreements. Journal of International Economics 85 (2), 186–205.

Magee, C. S., 2008. New measures of trade creation and trade diversion. Journal of International

Economics 75 (2), 349–362.

Maoz, Z., Henderson, E. A., 2013. The world religion dataset, 1945-2010: Logic, estimates, and

trends. International Interactions 39 (3), 265–291.

Matyas, L., Balazsi, L., 2012. The estimation of multi-dimensional fixed effects panel data models.

mimeo.

Melitz, M. J., November 2003. The impact of trade on intra-industry reallocations and aggregate

industry productivity. Econometrica 71 (6), 1695–1725.39

Melitz, M. J., Ottaviano, G. I., 2008. Market size, trade, and productivity. The review of economic

studies 75 (1), 295–316.

Rauch, J. E., 1999. Networks versus markets in international trade. Journal of International Eco-

nomics 48 (1), 7–35.

Ray, E. J., 1981. The determinants of tariff and nontariff trade restrictions in the united states.

Journal of Political Economy 89 (1), 105–21.

Wansbeek, T., Kapteyn, A., 1989. Estimation of the error-components model with incomplete

panels. Journal of Econometrics 41, 341–361.

Wooldridge, J., 2009. Correlated random effects models with unbalanced panels. mimeo.

40

estimating trade elasticities using product-level data · illustration, we provide some examples of...

Documents