directory | georgia tech
TRANSCRIPT
1
TECHNOLOGY SUPPORT AND POST-ADOPTION IT SERVICE USE:
EVIDENCE FROM THE CLOUD
German F. Retana INCAE Business School
Alajuela, Costa Rica [email protected]
Chris Forman, Sridhar Narasimhan, Marius Florin Niculescu, D. J. Wu
Georgia Institute of Technology Scheller College of Business
800 West Peachtree NW, Atlanta GA 30308 {chris.forman, sri.narasimhan, marius.niculescu, dj.wu}@scheller.gatech.edu
September 2012 This Version: June 2015
Abstract
Does a provider’s technology support strategy influence its buyers’ post-adoption IT service use? We study this question in the context of cloud infrastructure services. The provider offers two levels of support, basic and full. Under basic support, the provider handles simple service quality issues. Under full support, the provider also offers education, training, and personalized guidance. Using a unique data set on public cloud infrastructure services use by 22,179 firms from March 2009 to August 2012 and fixed effects dynamic panel data models, we find that buyers who opt for full support use 31.84% more of the service than those who do not. Moreover, buyers continue learning from the provider as they continue having access to full support, and full support has a stronger influence on buyer behavior the longer it has been accessed. The effect of technology support on the effectiveness of buyers’ IT use follows a similar pattern: it increases upon switching from basic to full support and its impact grows over time.
Keywords: IT service, organizational learning, IT use, cloud computing, Infrastructure-as-a-Service, technology support, service strategies
2
TECHNOLOGY SUPPORT AND POST-ADOPTION IT SERVICE USE:
EVIDENCE FROM THE CLOUD
1.! Introduction
Businesses are increasingly shifting their information technology (IT) infrastructure from
traditional on-premises deployment to the cloud to take advantage of the commoditization of
some IT resources. In light of these changes, it is important to understand the impact brought by
a cloud provider’s post-adoption technology support strategy on buyers’ IT use. We explore this
question empirically in this research note.
The challenges of new technology adoption are well documented in the information
systems (IS) literature. Significant knowledge barriers cause firms to delay not only the adoption
(Attewell 1992; Chatterjee et al. 2002) but also the actual assimilation of IT (Fichman and
Kemerer 1997; Fichman and Kemerer 1999).1 While extant literature emphasized the importance
of organizational learning in overcoming knowledge barriers (Attewell 1992; Fichman and
Kemerer 1997), much less is known about how providers’ strategies related to knowledge
transfer affect buyers’ consumption level. We aim to fill this gap in understanding by examining
how different levels of a provider’s technology support may influence the manner and extent to
which a buyer uses a new IT service.
In our research setting, the provider’s buyers use its hardware resources and choose
between two levels of technology support, basic or full. A prime goal of full support is to educate
buyers on how to best use the cloud infrastructure service and adapt it to their idiosyncratic
1 In addition to knowledge barriers, researchers have documented several other factors that drive post-adoption variations in usage (Parthasarathy and Bhattacherjee 1998; Zhu and Kraemer 2005; Zhu et al. 2006). These studies focus mostly on buyers, and their internal capabilities, rather than their interactions with the provider.
3
business needs. When receiving full support, buyers receive personalized guidance and training,
and thus have the opportunity to learn directly from the provider’s prior experience in deploying
applications in the cloud. Buyers not willing to pay the price premium for full support will only
receive a default basic level of support.
We evaluate the implications of full support for buyer behavior. We collect unique data
from a major global public cloud provider of infrastructure services who sells computing power
and storage. Our panel data consist of consumption time series for 22,179 firms that used the
provider’s service at some point between March 2009 and August 2012. We use fixed effects
dynamic panel data models to compare buyers’ use of the service before adopting full support
and during their continued access to full support. We find that buyers who adopt and continue
having access to full support use, on average, 31.84% more of the IT service relative to buyers
who have only had access to basic support, suggesting that technology support has important
implications for buyer behavior. To our knowledge, this is the first study to quantitatively
document how support can influence IT service use. Furthermore, we show that the impact of
technology support grows over time, providing suggestive evidence that technology support
facilitates buyer learning.
We also probe how omitted variables may influence our results. A particular worry is
reverse causality, i.e., the support choice decision may follow IT use. To address this concern,
we first run our models employing matched subsamples that are constructed using a coarsened
exact matching (CEM) procedure (Blackwell et al. 2010) based on buyers’ attributes and usage
of the service before they upgrade from basic to full support. Second, we leverage detailed data
on buyers’ support interactions through online live chat sessions and support tickets as the basis
for instruments for buyer decisions to upgrade to full support. Third, we estimate dynamic panel
4
data models that include lagged dependent variables and use deeper lags of our variables as
instruments for both IT use and the support upgrade decision using a generalized method of
moments (GMM) estimation approach (Arellano and Bover 1995; Blundell and Bond 1998). We
augment this latter approach with our support-based instruments. The estimates across these
various subsamples and models are qualitatively consistent with our main findings.
We also investigate the impact of technology support on IT use by examining alternative
measures of infrastructure use. Specifically, we provide evidence that technology support helps
buyers make better and more efficient use of the service by quantifying the effects that full
support has on buyers’ likelihood of deploying horizontally distributed and scalable
architectures. We find that buyers increase the fraction of servers they run in a parallel and
horizontally scalable architecture by 3.19 percentage points after they switch from basic to full
support. Given that the mean proportion of servers running in parallel in our sample is only 12%,
this is an economically significant change in behavior.
Besides informing the IS literature on post-adoption usage of new technology, our results
have important implications for managers. The adoption rates of cloud infrastructure services
have been significantly below expectations (Emison 2013; Microsoft and Edge Strategies 2011;
SearchDataCenter.com 2011). A potential reason for this pattern is that these services are not
offered as fully outsourced, turnkey and ready-to-use solutions for firms. Surveys during the time
span of our data (e.g., Symantec 2011) suggested that most buyers were not well prepared to use
cloud services and that helping them overcome their knowledge limitations is vital for the
success of the cloud model. Our results provide quantitative evidence of the importance of
overcoming such knowledge barriers to cloud service use.
5
2.!Theory Background
In this section we provide motivation for how interactions with a service provider can increase
service use through knowledge transfer. We will provide specific testable implications of this
motivation in Section 3.
Firms adopting new technologies often face uncertainty over how to adapt these to the
idiosyncratic environments where they will be embedded, as well as broader issues revolving
around the complementary organizational and process changes required for new IT to be
deployed successfully (Fichman 2004; Hitt et al. 2002; Wu et al. 2013). It is well known that
firms’ internal capabilities and technical know-how affect both the timing of new IT adoption
(Attewell 1992; Bresnahan and Greenstein 1996; Forman et al. 2008) as well as post-adoption
usage (Parthasarathy and Bhattacherjee 1998; Zhu and Kraemer 2005; Zhu et al. 2006). In
particular, firms are known to delay not only the adoption (initial purchase) but also the actual
assimilation of a technology because of knowledge barriers (Åstebro 2004; Fichman and
Kemerer 1997).2 !Several studies stressed the importance of organizational learning in
overcoming the knowledge barriers of new technology adoption and use (Attewell 1992;
Chatterjee et al. 2002; Fichman and Kemerer 1997).
Third parties such as consultants or other firms can often serve as useful repositories of
knowledge on how to adopt and use new technologies (Bresnahan and Greenstein 1996; Chwelos
et al. 2001). However, an important problem is how to transfer tacit and sticky knowledge on
technology use to new settings (Alavi and Leidner 2001). Such knowledge transfer—where a
2 At the individual level, for many services, buyers frequently play a dual role as both recipients and producers of the service, performing actions that are essential to the value they receive from the service. This phenomenon is known as service co-production (e.g., Xue et al. 2011). Extant research has consistently shown that customers’ capabilities in co-producing the service are a key determinant of their adoption and continued use (e.g., Buell et al. 2010; Frei 2008; Xue and Harker 2002; Xue et al. 2007).
6
source communicates knowledge so it is learned and applied by a recipient (Ko et al. 2005)—has
been studied within organizations in various contexts within the IS literature (Alavi and Leidner
2001). However, as noted above, knowledge transfer can also occur between firms, as when
consultants share knowledge with their clients. For example, researchers have reported survey-
based evidence that providers and consultants can transfer technical knowledge to the client
through interactions (Ko et al. 2005). Better knowledge of how to use the system can increase
post-adoption use of business IT systems (Åstebro 2004). However, to our knowledge, there is
little quantitative evidence on how a provider’s various specific strategies to transfer knowledge
to buyers affect the realized post-adoption consumption level for the offered service.
In this study, in the context of cloud infrastructure services, we focus on one such
strategy that facilitates interactions between providers and buyers, the offering of personalized
technology support, and seek to measure its impact on service consumption volume and
efficiency. Because many of the nuances of cloud deployment are not the norm in traditional
application architectures, there are several reasons why overcoming knowledge barriers may play
an important role in enabling the demand for cloud services. For example, many of the expected
features of enterprise-grade servers, such as redundant components that ensure high availability
and physical access to servers, are not present in the cloud. The cloud requires users to design for
failure (Reese 2009) and consider how to keep an application running if any given server
randomly disappears. Moreover, the cloud’s scaling capabilities can only be truly exploited if the
applications scale out horizontally (i.e., employ several servers performing functions in parallel).
A 2011 survey found that only 25% of IT staff in global organizations had cloud experience with
public infrastructure or platform-as-a-service, and 50% of the organizations claimed that their
staff was “less than somewhat prepared to handle” these services (Symantec 2011).
7
Thus, it is non-trivial for some of the buyers of cloud infrastructure services to overcome
these knowledge barriers on their own. A provider can greatly assist its buyers in lowering these
barriers and uncertainties by transferring knowledge and by training buyers how to better use the
service via technology support. In our setting, the provider offers personalized guidance and
training via full support. For example, when offering full support, the provider takes a proactive
approach in helping users configure their software applications so that they effectively scale in
the cloud. This is a common issue for e-commerce applications with uncertain customer-driven
IT capacity demand due to the implications of new product introductions and marketing
campaigns that will generate temporary spikes in usage. Thus, full support is different from pure
outsourcing where the provider does everything for the customer and “takes the burden of
learning off the back of a potential user” (Attewell 1992).
Based on the above theoretical arguments, we posit that buyers who adopt and have
continued access to full support use more service compared to similar buyers who only have
access to basic support.
3.!Empirical Model
3.1.! Effects of Full Support on Service Use
We employ linear fixed effects dynamic panel data models to tease out the effects on cloud use
of the adoption of and continued access to full support. The pay-per-use model provides cloud
infrastructure buyers the freedom to pay only for the computing resources they consume. In our
setting, the provider bundles server capacity in terms of memory (GB of RAM), processing
power (number of virtual CPUs), and storage (GB space of local hard disk). The three attributes
are highly correlated in the offer menu; a server with more of one attribute had more of the other
two. Since the servers are priced based on the amount of memory they have, and memory is the
8
basis for buyers’ infrastructure sizing decisions, the amount of memory consumed by buyers
over time is a direct measure of their use of cloud services. We compute the average GB of RAM
used by a buyer per month and denote it as !"#$%&',). Then, given the strong positive skew in
its distribution, following standard practice we compute *+!"#$%&',) = ln !"#$%&',) + 1 and
use it as our dependent variable. All variables are summarized in Appendix A.
Our first model tests if current or prior adoption of full support is associated with greater
memory use:
*+!"#$%&',) = 1 + 2345**678759',) + :;*+!"#$%&',)<;
=
;>?
3+ @' + A) + B',) + C',). (1)
Subscripts E and 7 index individual buyers (firms) and time periods (months) respectively.
45**678759',) is a binary variable that indicates if full support was adopted by buyer E3by time 7,
and is equal to one in all periods after the buyer adopts full support. Thus, 2 identifies the effects
on cloud use of having adopted full support. After adopting full support some buyers may opt to
switch to basic support. As discussed in Appendix B, this switch is associated with a subtle drop
in usage.
We additionally include lagged values of *+!"#$%&',) to control for persistence in
memory use levels, i.e., that buyers’ use in prior periods may strongly influence their use in the
focal period. This approach suffers from dynamic panel bias as it fails the strict exogeneity
assumption commonly employed in fixed effects panel models (Nickell 1981; Roodman 2009a).
We address this bias through System GMM estimation (Anderson and Hsiao 1981; Archak et al.
2011; Arellano and Bond 1991; Arellano and Bover 1995; Blundell and Bond 1998; Ghose
2009). We will show results using 3 lags F = 3 , yet results are consistent if we use fewer or
more lags (e.g., F = 1, 4). We elaborate on our use of System GMM in the results section.
9
Parameter @' is the buyer fixed effect and A) is a vector of calendar month fixed effects.
We also include a vector of dummy variables, B',), indicating in what month of its tenure a buyer
is when month 7 starts. Finally, parameter C',) is our error term which we assume is correlated
only within individual firms, but not across them.
Our fixed effects model allows us to difference out unobserved time-invariant buyer-level
heterogeneity that may influence both the choice of support type and IT use. We also run our
models using matched subsamples constructed using a coarsened exact matching (CEM)
procedure (Blackwell et al. 2010). CEM reduces the dependence of our estimates on our model
specification and also reduces endogeneity concerns when making causal inferences (Ho et al.
2007). As described in further detail below, we match firms based on their pre-upgrade memory
consumption levels, pre-upgrade frequency of infrastructure resizing (i.e., number of changes in
their total memory use), intended use cases for the cloud service, industry, and size.
Further, we use exogenous failure events experienced by buyers as an instrument for their
support choice decision. When this type of unforeseeable problem occurs, the support
interactions that take place between buyers and the provider can serve as a signal to buyers for
the value of full support. Basic support buyers who, because of the failure, obtain experience in
using the service with a greater involvement from the provider, may be more likely to upgrade to
full support than buyers who do not have such experiences with the provider. However, such
interactions on their own are unlikely to increase use of the provider’s service. Additionally,
since the failures are exogenous (e.g., can occur with equal probability to any server independent
of the support choice), they are also not directly related to any learning or level of technical
sophistication of the buyer. We employ a probit model that has the exogenous failures as
regressors to generate predicted values for 45**678759',), which we denote345**678759',)I . We
10
then use the fitted value, 45**678759',)I , as our instrument in a standard two-stage least squares
(2SLS) estimation (Angrist and Pischke 2009, pp. 142-144; Imbens and Wooldridge 2007). We
note that the lagged levels of service usage in our model control for the potential correlation
between the size of the cloud infrastructure deployment and its likelihood of experiencing a
failure in some of its components.
3.2.! Time Varying Effects of Full Support
To allow the marginal effect of switching to full support to vary in a flexible way over time, we
employ indicator variables for the lags of the adoption event, JK$F745**',). This variable is only
set to 1 in the period when full support is adopted. Thus, the lags of the form JK$F745**',)<L
indicate if buyer E adopted full support M periods ago (counting from period 7). We use this
indicator in the following autoregressive distributed lag (ARDL) model (Greene 2008, pp. 681-
689):
*+!"#$%&',) = 1 + 2LJK$F745**',)<L
N
L>O
+ 2P45**678759',)<N<?
+ :;3*+!"#$%&',)<;
=
;>?
3+ @' + A) + B',) + C',).
(2)
As with Model (1), we will show results using 3 lags F = 3 of the dependent variable, yet
results are consistent if we use a different number of lags. We include % = 12 lags of
JK$F745**',) so that our model identifies the effects of adopting full support during the 12
months following the event. Results are robust to changes in the number of lags. We use
45**678759',)<N<? to account for the effect of adopting full support beyond % months in the past.
11
4.!Data and Sample Construction
One of the essential characteristics of cloud infrastructure services is that they are offered on-
demand (Mell and Grance 2011). Buyers only pay hourly rates contingent on server capacity and
operating system. However, there are important technical challenges in deploying horizontally
scalable configurations where several cloud servers work in parallel, which may in turn limit
buyers’ ability to use many servers at once. As mentioned before, the provider offers two levels
of support, basic and full. Under full support, the provider charges a fixed price premium per
server-hour used plus an additional fixed monthly fee (which is prorated on a daily basis). There
are no sign-up or termination fees for the full support service. Please see Appendix C for a
detailed description of the provider’s cloud infrastructure services, their pricing, and the
corresponding levels of technology support. In Appendix D we discuss the potential implications
of server operating system heterogeneity.
We have collected a unique data set on cloud infrastructure services and technology
support use from a provider. Our entire data set given to us by the provider includes 79,619
buyers that adopted the provider’s services at some point between March 2009 and August 2012.
To isolate the causal effects of full support, we restrict our baseline sample to buyers who are
likely to have similar usage profiles over time, but for their adoption of full support. We exclude
buyers who use the service very little or who do not change their cloud architecture configuration
(i.e., do not resize their infrastructure).3 These buyers have very different time-varying profiles
relative to full support buyers and, although we exclude them ex ante, they likely would also be
3 We exclude buyers who only accessed basic support and averaged 512 MB RAM/hour or less during their first 6 months (excluding 1st month) or made no adjustments to size of their infrastructure during their first 6 months (excluding 1st month). An infrastructure resizing occurs in any launch, halt, or resizing of a server in the buyers’ cloud infrastructure. We do not consider their behavior during their 1st month in our threshold because most buyers are setting up their infrastructure during this time.
12
excluded later by our CEM procedures. After these restrictions, our baseline sample includes
22,179 buyers and 368,606 buyer-month observations. Table 1 provides descriptive statistics of
the cloud use time-varying variables in our baseline sample; we will describe our second
dependent variable 4%8R7E$+S8%8**"*',) later in section 5.2, but include it in the table for
completeness. Table 1 also presents statistics contingent on buyers’ support choice
45**678759',) ; difference in means t-tests for all variables are significant at the 1% level.
Table!1.!!Descriptive!Statistics!of!Time5Varying!Variables!(Baseline!sample,!22,179!buyers)!Support!Type!Used! Full!or!Basic! TUVVWXYXUZ[,X = \! TUVVWXYXUZ[,X = ]!Observations, 368,606, 309,544, 59,062,Variable! Mean! S.D.! Min! Max! Mean! S.D.! Min! Max! Mean! S.D.! Min! Max!!"#$%&',), 7.88, 31.37, 0, 2,284.54, 7.26, 30.92, 0, 2,284.54, 11.11, 33.41, 0, 1,917.40,*+!"#$%&',)! 1.34, 1.04, 0, 7.734, 1.30, 1.01, 0, 7.734, 1.62, 1.15, 0, 7.56,45**678759',), 0.16, 0.37, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1,4%8R7E$+S8%8**"*',), 0.12, 0.27, 0, 1, 0.12, 0.26, 0, 1, 0.13, 0.28, 0, 1,
In addition to the buyers’ cloud use data, we have also collected data on the timing and
content of all support interactions through online live chat sessions and support tickets between
the buyers and the provider, starting from October 2009. We provide further details on these data
when we describe our instrumental variables procedure.
Finally, we have collected data from a survey administered to buyers upon signup of a
new account. We use these data to identify the firms’ total employment, their intended use case
for the cloud service, and their industry. We combine these 3 buyer attributes with their
infrastructure usage data to match buyers who exclusively use basic support to buyers who start
with basic support and later upgrade to full support. We use increasingly stringent matching
criteria across our CEM-based subsamples, denoted CEM1, CEM2 and CEM3. Further details on
the signup survey are included in Appendix E, while the construction of our CEM-based
subsamples is described in depth in Appendix F.
13
5.!Results
5.1.! Effects of Technology Support on IT Use
We present the results for Model (1) using the baseline sample in Column (1) of Table 2. The
results indicate that buyers who adopt and have access to full support use, on average, 31.84%
(i.e., "O.^_` − 1) more memory than buyers who have access to basic support. The results with
our various CEM-based subsamples, shown in columns (2) through (4), are consistent with those
obtained with the baseline sample. The percentage changes in memory use associated with the
upgrade from basic to full support range between 34.21% and 37.59%. In the remainder of the
paper, we continue basing our analysis on models run using the CEM1 subsample. We chose this
subsample over the baseline subsample because the matching procedure, along with its weights,
reduces our concerns of having unobserved heterogeneity between the treated and the control
groups. Further, the CEM1 sample has more observations than CEM2 and CEM3, which allows
for greater precision in our estimates. However, our results remain consistent throughout our
analysis if we use the finer strata of CEM2 or CEM3.
Table!2.!!Baseline!Results!for!IT!Use!Column! (1)! (2)! (3)! (4)!Sample! Baseline! CEM1! CEM2! CEM3!45**678759',), 0.276***, 0.319***, 0.318***, 0.294***,
(0.009), (0.021), (0.021), (0.033),*+!"#$%&',)<?, 0.969***, 0.955***, 0.960***, 1.009***,
(0.007), (0.022), (0.026), (0.024),*+!"#$%&',)<^, <0.187***, <0.186***, <0.191***, <0.260***,
(0.008), (0.034), (0.041), (0.031),*+!"#$%&',)<b, 0.035***, 0.003, 0.013, 0.037**,
(0.004), (0.018), (0.021), (0.017),Observations, 302,833, 40,671, 31,750, 11,201,Buyers, 20,906, 2,680, 2,023, 687,R2, 0.776, 0.770, 0.784, 0.805,Upgrade,change,("c − 1),,, 31.84%, 37.59%, 37.46%, 34.21%,Dependent variable is *+!"#$%&',). All regressions include monthly calendar (A)) and tenure dummies (B',)). Robust standard errors, clustered on buyers, in parentheses. * F3 < 30.10, ** F3 < 30.05, *** F3 < 30.01.
14
Instrumental Variables Approach: We have also used the support interaction data to identify
when buyers suffer from exogenous failures in using the cloud service. As expressed before,
these exogenous and unforeseeable shocks force the buyer to interact with the provider, which
serves as a useful signal of the provider’s service capabilities. In particular, buyers discover that
by interacting more closely with the provider they can reduce their total cost of solving their
complications. This motivates them to upgrade from basic to full support, under which they will
be able to continue having similar interactions. Although the number of failures suffered by a
buyer may be correlated with the number of servers it is employing, the lagged values of IT use
control for this risk.
We identify three different types of exogenous failures: generalized outages across the
cloud infrastructure service, network-related failures, and instances in which buyers suffer
degraded performance due to a problem in the physical host in which their virtual machine runs.
The vectors of variables 48E*g578h"i',), 48E*i"7j$%ki',), and 48E*l$97i',) have dummies
that are turned on if buyers have experienced at least i failures of each corresponding type by
time 7. We discuss the construction of these vectors in detail in Appendix G. Given that failures
may have differential effects on the likelihood of upgrading for less experienced buyers, we also
interact these dummies with an indicator of buyers still being in their first trimester (i.e., first 3
months) since signup, which we denote m%E#"97"%1',). In this section we comment on our
results using 2 dummies of each type (i.e., i = 1, 2), yet our results are consistent using 1 or 3 of
them.
Given the binary nature of our endogenous variable, we first follow the approach
suggested by Imbens and Wooldridge (2007) and Angrist and Pischke (2009, pp. 142-144), and
as a first step in our estimation process, we use the vector of failure-related indicators and their
15
interaction with m%E#"97"%1',) in a probit model using 45**65FF$%7',) as dependent variable.
We use each failure type independently in columns (1) through (3) in Part C of Table 3, and all 3
types of failures in column (4). The results suggest that, as proposed, all failure types are
positively associated with buyers’ likelihood of adopting full support. We use the probit model to
generate the fitted values of 45**65FF$%7',), which we denote as 45**65FF$%7',)I . Next, we
employ 45**65FF$%7',)I as our instrument for 45**65FF$%7',) in a 2SLS estimation procedure.
The first stage results are reported in Part A of Table 3. The values of the F-statistic for the
excluded instruments range between 60.73 and 69.01, and in all cases are significant at the 1%
level. It is evident that 45**65FF$%7',)I is positively associated with the real 45**65FF$%7',).
The second stage results using *+!"#$%&',) as our dependent variable are reported in
columns (1) through (4) of Table 4. The coefficients for 45**678759',) are high relative to
models without instruments, yet qualitatively consistent with our baseline results.
While the failure events identified through the support interactions are completely
unexpected to the buyer, their exogeneity can be questioned if one considers that buyers with a
greater number of servers are more likely to suffer at least one failure in any of their servers.
System GMM Estimation and Endogenous Adoption Decisions: To address the above
concern, we employ System GMM estimation methods that consider IT use and support choice
as endogenous and use their lagged values as their instruments. We first find the minimum
number of lags of *+!"#$%&',) that we can use while finding a valid specification that is not
overidentified (Hansen 1982) nor suffers from serial correlation (Arellano and Bond 1991). This
leads us to include 3 lags of *+!"#$%&',) in all our models. Then, we find the minimum number
of lags of the covariates that we can use as instruments to avoid the problem of overfitting the
16
Table!3.!Probit!for!TUVVWUnnopX[,X!and!First!Stage!Results!with!fitted!TUVVWUnnopX[,Xq !
Column! (1)! (2)! (3)! (4)!Failure!Types! Outage! Network! Host! All!3!Part!A.!First!Stage!Regression!of!Fitted!TUVVWXYXUZ[,X
q !on!Real!TUVVWXYXUZ[,X!
45**678759',)I , 0.414***, 0.577**, 0.285***, 0.379***,
(0.071), (0.250), (0.096), (0.070),*+!"#$%&',)<?, 0.104***, 0.107***, 0.105***, 0.103***,
(0.007), (0.007), (0.007), (0.007),*+!"#$%&',)<^, <0.013***, <0.013***, <0.013***, <0.013***,
(0.003), (0.003), (0.003), (0.003),*+!"#$%&',)<b, <0.001, 0.002, 0.001, <0.002,
(0.004), (0.004), (0.004), (0.004),Observations, 40,648, 40,648, 40,648, 40,648,Buyers, 2,657, 2,657, 2,657, 2,657,R2, 0.108, 0.096, 0.097, 0.107,First,Stage,F<Statistic, 69.01, 61.39, 60.73, 67.61,
Part!B.!Descriptive!Statistics!of!TUVVWXYXUZ[,Xq !
Mean, 0.093, 0.095, 0.091, 0.089,Std.,Dev., 0.100, 0.087, 0.089, 0.098,Min, 0.000, 0.000, 0.000, 0.000,Max, 0.905, 0.710, 0.681, 0.930,Part!C.!Coefficients!of!Probit!with!TUVVWXYXUZ[,X!as!dependent!variable!48E*g578h"1',), 0.960***, , , 0.743***,
(0.041), , , (0.044),48E*g578h"2',), 0.685***, , , 0.605***,
(0.077), , , (0.082),48E*i"7j$%k1',), , 0.684***, , 0.251***,
, (0.066), , (0.072),48E*i"7j$%k2',), , 0.067, , <0.873***,
, (0.234), , (0.265),48E*i"7j$%k1',), , 0.250, , 0.686,×m%E#"97"%1',), , (0.556), , (0.553),
48E*l$971',), , , 0.432***, 0.355***,, , (0.027), (0.027),
48E*l$972',), , , 0.519***, 0.176***,, , (0.042), (0.046),
48E*l$971',), , , <0.236, <0.174,×m%E#"97"%1',), , , (0.246), (0.240),
Constant, <0.558, <0.397, <0.615, <0.759,(0.570), (0.542), (0.582), (0.631),
Observations, 45,815, 45,847, 45,830, 45,802,Pseudo<R2, 0.123, 0.085, 0.113, 0.135,Linear regressions in Part A and probit regressions in Part C include monthly calendar (A)) and tenure dummies (B',)). Most interactions with m%E#"97"%1',) in Part C are dropped due to collinearity. Robust standard errors, clustered on buyers, in parentheses. * F3 < 30.10, ** F3 < 30.05, *** F3 < 30.01.
17
Table!4.!Results!with!2SLS!and!SGMM!for!Vstuvopw[,X!Column! (1)! (2)! (3)! (4)! (5)! (6)! (7)! (8)!Estimation!Procedure! 2SLS! System!GMM!45**678759',), 0.485***, 0.280, 0.735**, 0.543***, 0.097***, 0.055***, 0.103***, 0.063***,
(0.123), (0.300), (0.300), (0.136), (0.029), (0.016), (0.030), (0.017),*+!"#$%&',)<?, 0.937***, 0.959***, 0.910***, 0.931***, 0.941***, 0.992***, 0.932***, 0.959***,
(0.026), (0.039), (0.040), (0.027), (0.062), (0.053), (0.061), (0.055),*+!"#$%&',)<^, <0.183***, <0.186***, <0.180***, <0.183***, <0.024, <0.054, <0.018, <0.019,
(0.033), (0.034), (0.033), (0.033), (0.069), (0.062), (0.068), (0.064),*+!"#$%&',)<b, 0.002, 0.003, 0.001, 0.002, <0.004, 0.036**, <0.005, 0.028*,
(0.018), (0.018), (0.017), (0.018), (0.022), (0.016), (0.022), (0.016),Observations, 40,648, 40,648, 40,648, 40,648, 40,671, 40,671, 40,671, 40,671,Buyers, 2,657, 2,657, 2,657, 2,657, 2,680, 2,680, 2,680, 2,680,Failure<based,IVs, Outage, Network, Host, All,3, <, <, All,3, All,3,Lags,of,first,differences,used,as,IVs, , , , , All,
avail.,Least,Possible,
All,avail.,
Least,Possible,
Total,Number,of,IVs, , , , , 864, 296, 870, 307,Hansen,J,Statistic,,p<value, , , , , 0.810, 0.600, 0.668, 0.238,
Upgrade,change,("c − 1), 62.40%, 32.37%, 108.60%, 72.03%, 10.20%, 5.69%, 10.87%, 6.51%,Dependent variable is *+!"#$%&',). All regressions include monthly calendar (A)) and tenure dummies (B',)). Columns (1) through (4) show robust standard errors, clustered on buyers, in parentheses. System GMM models in columns (5) through (8) have robust standard errors that use Windmeijer’s (2005) finite sample correction. * F3 < 30.10, ** F3 < 30.05, *** F3 < 30.01. Hansen J statistic not reported for 2SLS estimations in columns (1) through (4) as model is exactly identified. System GMM estimations in columns (5) through (8) consider 45**678759',) as endogenous. Given AR(2) in the errors, they all use the 2nd lag of the first difference of *+!"#$%&',) and 45**678759',) as their instruments for the levels equation. Columns (5) and (7) use all available lags of *+!"#$%&',) and 45**678759',) as instruments for the first differences equation, from the 3rd lag until the end of the panel. Columns (7) and (9) only use the 3rd lag of *+!"#$%&',) and from the 3rd to the 11th lag of 45**678759',) as instruments for the differences equation. Additionally, columns (8) and (9) augment the instruments matrix by considering the same vector of exogenous failure-related instruments shown in column (4) of Table 3.
model with too many instruments (Roodman 2009b). Finally, we augment our instrument matrix
with the exogenous failure-based instruments used in column (4) of Part C of Table 3. The
specifics of these processes are described in Appendix H. Next, we discuss the results of these
various models.
We show the model with all available instruments in column (5) of Table 4. The
coefficient for 45**678759',) suggests an increase in memory usage of 10.20% (i.e., "O.Ox_ − 1).
The results with the minimum number of instruments possible are reported in column (6), and we
continue finding a positive and significant effect for full support, this time representing an
increase in memory usage of 5.69% (i.e., "O.Oyy − 1). Finally, we augment our instrument matrix
18
for these same model specifications with the exogenous failure-based instruments. The new
results are shown in columns (7) and (8) of Table 4 and do not vary much relative to those
already discussed in columns (5) and (6).
5.2.! Effects of Technology Support on Efficiency of IT Use
As mentioned in our theory background section, buyers who access full support may learn from
the provider in ways that enable them to make better use of the cloud service. We test if it is true
that buyers make better and more efficient use of the advanced cloud specific infrastructure as a
result of having access to full support. An advantage of cloud infrastructure services is that we
can partially observe certain attributes of buyers’ deployments, some of which are diagnostic in
assessing how proficient a buyer is in making use of the infrastructure. If full support helps
buyers use the service better, one would expect that they employ architectures that can scale
more efficiently, although this comes at the cost of increased complexity. We explain this
assertion and offer a test of it in the discussion below.
Although the on-demand nature of the service along with its rapid elasticity provides
firms the opportunity to reduce idle computing capacity waste and eliminate the necessity of an
up-front capital commitment in overprovisioning resources (Armbrust et al. 2010; Harms and
Yamartino 2010), doing so requires firms to scale their infrastructure in a cost-efficient manner.
There are essentially two ways of growing an IT infrastructure: vertically and horizontally
(Garcia et al. 2008; Michael et al. 2007; Reese 2009, p. 176). Scaling vertically, while easy to
execute since it generally only implies increasing the capacity of the single server performing a
function, does not allow the buyer to truly leverage the cloud’s scalability. For example, growth
is capped by the maximum server capacity available. In contrast, scaling horizontally with
several servers performing functions in parallel is complex. However, it offers virtually
19
unlimited growth potential plus it allows buyers to have a more resilient architecture. For greater
details on the benefits and complexities of the scaling methods, please see Appendix I.1.
As a result of these increased efficiencies and complexity, we use the fraction of servers
running in parallel as a measure that proxies for a buyer’s skill at using cloud computing. We
emphasize that although launching a single server is a trivial task for any system administrator,
launching several of them in a horizontally scalable manner is non-trivial. Additionally, this
measure varies separately from memory use, our first dependent variable: a buyer can consume a
large volume with none of its servers functioning in parallel, in which case the fraction is zero, or
a small volume with all of its servers functioning in parallel, which makes the faction equal to 1.
To compute this metric we scan the names of the servers used daily by buyers and count, to the
extent possible, how many of them are performing the same functions; the process is explained
in Appendix I.2. The monthly average fraction of servers running in parallel is captured in our
new dependent variable, 4%8R7E$+S8%8**"*',) (see Table 1 for descriptive statistics).
We estimate the exact same models described in our empirical approach (Section 3) but
substitute 4%8R7E$+S8%8**"*',) for *+!"#$%&',) as the dependent variable. Overall, our results
are consistent with previous findings when using the IT service use dependent variable,
providing additional evidence that full support enables buyers to use the cloud more efficiently.
Results in Table 5 show that buyers who have adopted and continue having access to full
support have a fraction of servers working in parallel that is between 3.19 and 4.26 percentage
points higher than that of basic support users; this is significant considering that mean
4%8R7E$+S8%8**"*',) is 0.12. We show these models using 2 lags of 4%8R7E$+S8%8**"*',) as
covariates so that the results are comparable to those used in the System GMM approach below,
yet the results are consistent if we use different numbers of lags.
20
Table!5.!Results!for!Tests!of!Effects!of!Full!Support!on!Efficiency!of!IT!Use!Column! (1)! (2)! (3)! (4)!Sample! Baseline! CEM1! CEM2! CEM3!
45**678759',),0.032***, 0.040***, 0.040***, 0.043***,(0.002), (0.005), (0.005), (0.008),
4%8R7E$+S8%8**"*',)<?, 0.897***, 0.886***, 0.881***, 0.903***,(0.005), (0.014), (0.016), (0.030),
4%8R7E$+S8%8**"*',)<^, <0.165***, <0.179***, <0.180***, <0.186***,(0.004), (0.011), (0.012), (0.023),
Observations, 324,406, 43,355, 33,779, 11,888,Buyers, 21,573, 2,684, 2,029, 687,R2, 0.637, 0.614, 0.611, 0.628,Upgrade,change, 2×100 ,,, 3.19, 4.00, 3.99, 4.26,Dependent variable is 4%8R7E$+S8%8**"*',). All regressions include monthly calendar (A)) and tenure dummies (B',)). Robust standard errors, clustered on buyers, in parentheses. * F3 < 30.10, ** F3 < 30.05, *** F3 < 30.01.
As before, we implemented a 2SLS model with exogenous failures as instruments for
45**65FF$%7',). The first stage results are reported in Table 6, in all cases having the fitted full
support (45**678759',)I ) positively associated with the real full support (45**678759',)). Columns
(1) through (4) of Table 7 show the second stage results of 2SLS. When using the instrumented
45**678759'), we find that the fraction of servers running parallel grows between 8.82 and 9.49
percentage points after buyers upgrade from basic to full support using the results with outage-
related failures (column 1) or all failures (column 4). The results using only the network-related
(column 2) and host-related (column 3) failures are not statistically significant; their first stage F-
statistics in Table 6 suggest they may be less powerful instruments.
Table!6.!First!Stage!Results!with!fitted!TUVVWXYXUZ[,Xq !and!lags!of!TpYzX[os{YpYVVuV[,X!
Column! (1)! (2)! (3)! (4)!Failure!Types! Outage! Network! Host! All!3!45**678759',)
I , 0.581***, 0.863***, 0.510***, 0.558***,(0.077), (0.230), (0.099), (0.071),
4%8R7E$+S8%8**"*',)<?, 0.103***, 0.112***, 0.110***, 0.104***,(0.014), (0.014), (0.014), (0.014),
4%8R7E$+S8%8**"*',)<^, <0.009, <0.004, <0.008, <0.013,(0.008), (0.008), (0.008), (0.008),
Observations, 43,332, 43,351, 43,341, 43,325,Buyers, 2,680, 2,680, 2,680, 2,680,R2, 0.041, 0.018, 0.025, 0.042,First,Stage,F<Statistic, 41.45, 26.76, 30.38, 42.41,Dependent variable is 45**678759',). Linear regressions include monthly calendar (A)) and tenure dummies (B',)). Robust standard errors, clustered on buyers, in parentheses. * F3 < 30.10, ** F3 < 30.05, *** F3 < 30.01,
21
Continuing with the same models used for our first dependent variable, we also employ
the System GMM approach to consider both 4%8R7E$+S8%8**"*',) and 45**678759',) as
endogenous. Please refer to Appendix H.2 for the number of lags and instruments selection
process. We report the System GMM estimation with all available instruments in column (5) of
Table 7, which suggests the fraction increases by 1.35 percentage points when the buyer has
switched to full support. Although the estimation with all available instruments in column (5)
yields statistically significant coefficients for 45**678759',), column (6) with the reduced number
of instruments does not. Estimation results in columns (7) and (8) of Table 7 employ an
augmented instruments matrix that incorporates the exogenous service failure events as
additional instruments, and have estimates consistent with those in columns (5) and (6).
Table!7.!Results!with!2SLS!and!SGMM!for!TpYzX[os{YpYVVuV[,X!Column! (1)! (2)! (3)! (4)! (5)! (6)! (7)! (8)!Estimation!Procedure! 2SLS! System!GMM!45**678759',), 0.095***, <0.014, 0.065, 0.088***, 0.014**, 0.005, 0.014**, 0.005,
(0.032), (0.049), (0.042), (0.031), (0.006), (0.005), (0.006), (0.005),4%8R7E$+S8%8**"*',)<?, 0.880***, 0.892***, 0.883***, 0.881***, 0.801***, 0.815***, 0.799***, 0.805***,
(0.014), (0.015), (0.014), (0.014), (0.036), (0.043), (0.036), (0.043),4%8R7E$+S8%8**"*',)<^, -0.179***, <0.179***, -0.179***, -0.179***, 0.029, 0.043, 0.031, 0.052,
(0.011), (0.011), (0.011), (0.011), (0.035), (0.043), (0.035), (0.043),Observations, 43,332, 43,351, 43,341, 43,325, 43,355, 43,355, 43,355, 43,355,Buyers, 2,680, 2,680, 2,680, 2,680, 2,684, 2,684, 2,684, 2,684,Failure<based,IVs, Outage, Network, Host, All,3, <, <, All,3, All,3,Lags,of,first,differences,used,as,IVs, , , , , All,
avail.,Least,Possible,
All,avail.,
Least,Possible,
Total,Number,of,IVs, , , , , 798, 482, 809, 493,Hansen,J,Statistic,p<value, , , , , 0.984, 0.345, 0.985, 0.319,Upgrade,change,(2×100), 9.49, <1.35, 6.50, 8.82, 1.35, 0.49, 1.39, 0.51,Dependent variable is 4%8R7E$+S8%8**"*',). All regressions include monthly calendar (A)) and tenure dummies (B',)). Columns (1) through (4) show robust standard errors, clustered on buyers, in parentheses. System GMM models in columns (5) through (8) have robust standard errors that use Windmeijer’s (2005) finite sample correction. * F3 < 30.10, ** F3 < 30.05, *** F3 < 30.01. Hansen J statistic not reported for 2SLS estimations in columns (1) through (4) as model is exactly identified. System GMM estimations in columns (5) through (8) consider 45**678759',) as endogenous. Given AR(2) in the errors, they all use the 2nd lag of the first difference of 4%8R7E$+S8%8**"*',) and 45**678759',) as their instruments for the levels equation. Columns (5) and (7) use all available lags of 4%8R7E$+S8%8**"*',) and 45**678759',) as instruments for the first differences equation, from the 3rd lag until the end of the panel. Columns (6) and (8) only use the 3rd to 12th lag of 4%8R7E$+S8%8**"*',) and the 3rd to the 8th lag of 45**678759',) as instruments for the differences equation. Additionally, columns (7) and (8) augment the instruments matrix by considering the same vector of exogenous failure-related instruments shown in column (4) of Table 3.
22
5.3.! Time Varying Effects of Full Support
The estimation results for Model (2) employing both the full sample and the CEM1 subsample
are shown in Table 8. For both dependent variables, all coefficients for the JK$F745**',)<L
indicators are positive and statistically significant. This initially suggests that full support’s effect
does not fade, at least not entirely, over time. The coefficients do not change much if we employ
a different number of lags for the support indicators (%) or the dependent variables (F).
Nevertheless, the precise interpretation of these coefficients is not straightforward.
Since the lags of the dependent variables are influenced by the prior access to full
support, each lag of JK$F745**',) enters the model in highly complicated ways. To show the
time-varying effects of support we plot the impulse response functions of the dependent
variables to the switch in the support type (i.e., a unit change in a binary variable) (Hamilton
1994, pp. 318-323). Specifically, we compute and plot ~�ÄÅÇÉÑNÖÜ,á~àâÑ=)äã��Ü,áåç
over time M to show how
current memory usage is influenced by the adoption of full support M periods ago. We describe
the estimation procedure in detail in Appendix J.
We show the impulse response functions of the dependent variables to the adoption of full
support in Figure 1. The figure suggests buyers significantly grow their volume of service
consumption (left panel) as well as the proportion of servers they run in parallel (right panel)
immediately after full support adoption. Moreover, the effects of having access to full support
grow over time, as evidenced by the positive slope of the functions. Buyers continue to benefit
from having access to full support over time, and full support has a stronger influence on buyer
behavior the longer it has been accessed.
23
Table!8.!Results!with!Lags!of!Full!Support!Adoption!Indicators!! (1)! (2)! (3)! (4)!Dependent!Variable! w[,X = Vstuvopw[,X! w[,X = TpYzX[os{YpYVVuV[,X!Sample! Baseline! CEM1! Baseline! CEM1!JK$F745**',), 0.448***, 0.480***, 0.032***, 0.033***,
(0.017), (0.040), (0.004), (0.009),JK$F745**',)<?, 0.417***, 0.469***, 0.051***, 0.065***,
(0.016), (0.041), (0.004), (0.011),JK$F745**',)<^, 0.204***, 0.199***, 0.025***, 0.030***,
(0.012), (0.026), (0.002), (0.007),JK$F745**',)<b, 0.208***, 0.221***, 0.025***, 0.029***,
(0.009), (0.025), (0.003), (0.008),JK$F745**',)<é, 0.211***, 0.242***, 0.028***, 0.030***,
(0.009), (0.024), (0.002), (0.007),JK$F745**',)<y, 0.211***, 0.274***, 0.027***, 0.040***,
(0.009), (0.024), (0.002), (0.006),JK$F745**',)<`, 0.221***, 0.269***, 0.030***, 0.041***,
(0.009), (0.025), (0.002), (0.007),JK$F745**',)<_, 0.232***, 0.275***, 0.032***, 0.037***,
(0.009), (0.025), (0.002), (0.007),JK$F745**',)<è, 0.228***, 0.282***, 0.030***, 0.042***,
(0.009), (0.029), (0.003), (0.007),JK$F745**',)<x, 0.233***, 0.271***, 0.033***, 0.057***,
(0.010), (0.025), (0.003), (0.011),JK$F745**',)<?O, 0.239***, 0.272***, 0.032***, 0.040***,
(0.010), (0.034), (0.003), (0.008),JK$F745**',)<??, 0.238***, 0.287***, 0.031***, 0.046***,
(0.010), (0.032), (0.003), (0.009),JK$F745**',)<?^, 0.239***, 0.299***, 0.032***, 0.056***,
(0.011), (0.026), (0.003), (0.009),45**678759',)<?b, 0.248***, 0.277***, 0.033***, 0.037***,
(0.010), (0.026), (0.003), (0.008),&',)<?, 0.969***, 0.957***, 0.897***, 0.886***,
(0.007), (0.022), (0.005), (0.014),&',)<^, <0.181***, <0.175***, <0.165***, <0.179***,
(0.008), (0.034), (0.004), (0.011),&',)<b, 0.035***, <0.000, , ,
(0.004), (0.018), , ,Observations, 302,833, 40,671, 324,406, 43,355,Buyers, 20,906, 2,680, 21,573, 2,684,R<Squared, 0.778, 0.772, 0.638, 0.615,All regressions include monthly calendar (A)) and tenure dummies (B',)). Robust standard errors, clustered on buyers, in parentheses. * F3 < 30.10, ** F3 < 30.05, *** F3 < 30.01.
24
Figure!1.!Impulse!Response!Functions!of!Buyer!Behavior!to!Adoption!of!Full!Support!
6.!Conclusion
To the best of our knowledge, this note provides the first empirical evidence of how a service
provider’s technology support influences a buyer’s post-adoption IT use. We show that enhanced
technology support increases usage, and also provide suggestive evidence that buyer learning
from the provider may be responsible for these patterns. Furthermore, we advance existing
survey-based measures of post-adoption usage by providing detailed transactional data from our
research site on IT services consumption over time. Such data is valuable, as it allows the
researchers to evaluate the value to the provider of offering enhanced support services. Taken
together, our results call for a more complete and fully integrated theory of organization learning
and post-adoption usage in the IS literature.
Our study also has important managerial implications. From the provider’s perspective,
our results highlight the impact of full support on user behavior. Before our study, the provider
who offered us the data, while believing in the value of full support and offering it as a strategic
initiative, was actually uncertain of its precise economic value (while costs were understood, the
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
0 1 2 3 4 5 6 7 8 9 10 11 12
lnMem
ory
Months'after'Adopting'Full'Support
Full/Sample CEM1
0.00
0.05
0.10
0.15
0.20
0.25
0 1 2 3 4 5 6 7 8 9 10 11 12
FractionParallel
Months'after'Adopting'Full'Support
Full/Sample CEM1
25
impact on the revenue stream was unclear). A rough estimate of the profit gains for the provider
from having a buyer under basic support vs. full support suggests the switch yields at least 140%
increase in profits after considering revenue and support cost increases (see Table 9). Due to the
commoditization of cloud infrastructure services, it has been thought of cloud services as being
fully self-serviced, on-demand offerings with minimal interactions between buyers and service
providers (Mell and Grance 2011). For example, Amazon, the largest provider of cloud
infrastructure services, initially did not offer technology support. Our findings suggest that the
Table!9.!Estimate!of!Net!Profit!Gain!from!Full!Support!vs.!Basic!Support!per!Buyer!
Item! Support!Type! Units! Calculation!Basic! Full!Support!Costs!Number,of,Chats, 0.366, 0.702, Quantity,/,month, Mean,number,of,chats,/,month,Cost,of,a,Chats, $2.73, $5.24, $,/,month, Quantity,×,$7.46,a,Number,of,Tickets,b, 0.117, 0.650, Quantity,/,month, Mean,number,of,tickets,/,month,,Cost,of,a,Tickets, $4.31, $23.95, $,/,month, Quantity,×,$36.83,a,Cost,of,Support, $7.04, $29.19, $,/,month, Costs,of,Chats,+,Cost,of,Tickets,Cloud!Server!Profits!Estimated,Usage,c, 1,440.0, 1,898.5, GB,RAM/month, For,full,,median,usage,×,1.3184,Server,Hourly,Rate,d, $0.045, $0.090, $,/,GB,RAM,/,hour, Based,on,AWS,pricing.,Estimated,ARPU,e, $64.80, $170.86, $,/,month, Estimated,Usage,×,Hourly,Rate,Estimated,Profits, $51.84, $136.69, $,/,month, ARPU,×,80%,f,Difference!in!Profits,Net,Profits, $44.80, $107.51, $,/,month, Server,Profits,–,Support,Costs,Net,Profits,Gains,(abs.), $62.71, $,/,month, $107.51,–,$44.80,,Net,Profits,Gains,(%), 140%, %, $107.51,/,$44.80,–,1,a These are the estimated costs per chat session and ticket given to us by the provider. b We only count buyer-initiated (inbound) tickets. We exclude (outbound) announcements by provider through tickets. c Median usage under basic support is 2 GB RAM/hour; we multiply by 720 hours/month to get monthly usage under basic support. For full support we consider a 31.84% increase in usage from estimate in column (1) of Table 2. d During our sample period, Amazon Web Services’ (AWS) Elastic Compute Cloud (EC2), the public IaaS with the largest market share and thus with the dominant price-setting position, offered small 1.7 GB RAM servers at $0.08/hour and medium 3.75 GB RAM servers at $0.16/hour (source: aws.amazon.com). Based on these rates, we compute the mid-point price for 1 GB RAM / hour at $0.045. This is the price under basic support. For full support, even though the provider adds $0.12 to the hourly rate, we only add $0.045 to attain a conservative estimate. We also ignore the fixed monthly fee charged by the provider to buyers under full support. See Appendix C for more details on pricing. e Average Revenue per User. f The provider estimates their server-related variable costs are around 20%. These include server and datacenter depreciation expenses, datacenter rent, power and cooling, and non-infrastructure related items like credit card fees and bad debt expenses.
26
buyers’ continuous access to full support in the form of personalized guidance has significant,
quantifiable and sustainable business value. In this way, our research adds to other recent
findings about the value of service and support in the cloud setting. For example, Retana et al.
(2015) show that proactively providing customers with information about the value of a service
during the customer onboarding process can decrease customer attrition and decrease the number
of costly support interactions.
Our research has included a range of analyses that we have employed to isolate the
effects of full support on IT services uses. However, as in any empirical study, our research has
limitations. In particular, while our analyses of user decisions to use horizontally scalable
architectures provide suggestive evidence of learning, we were unable to directly observe
learning using our current research design.
Such limitations offer exciting opportunities for future research. For example, better data
on the productivity of service use would help researchers to more precisely isolate the effects of
provider interactions on customer learning. More broadly, we believe that future work should use
transactional data such as ours to gauge the impact of other buyer interactions with third parties,
such as traditional outsourcing firms and individuals in online communities of practice, to assess
their impact on the manner and effectiveness with which firms use IT. We hope our findings will
encourage additional work in this important area.
27
References
Alavi, M., and Leidner, D.E. 2001. "Review: Knowledge management and knowledge management systems: Conceptual foundations and research issues," MIS Quarterly (25:1), pp 107-136.
Anderson, T.W., and Hsiao, C. 1981. "Estimation of Dynamic Models with Error Components," Journal of the American Statistical Association (76:375), pp 598-606.
Angrist, J.D., and Pischke, J.-S. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ: Princeton University Press.
Archak, N., Ghose, A., and Ipeirotis, P.G. 2011. "Deriving the Pricing Power of Product Features by Mining Consumer Reviews," Management Science (57:8), August 1, 2011, pp 1485-1509.
Arellano, M., and Bond, S. 1991. "Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations," The Review of Economic Studies (58:2), pp 277-297.
Arellano, M., and Bover, O. 1995. "Another Look at the Instrumental Variable Estimation of Error-components Models," Journal of Econometrics (68:1), pp 29-51.
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., and Zaharia, M. 2010. "A View of Cloud Computing," Communications of the ACM (53-58:4), pp 50-58.
Åstebro, T. 2004. "Sunk Costs and the Depth and Probability of Technology Adoption," The Journal of Industrial Economics (52:3), pp 381-399.
Attewell, P. 1992. "Technology Diffusion and Organizational Learning: The Case of Business Computing," Organization Science (3:1), pp 1-19.
Azoulay, P., Graff Zivin, J.S., and Sampat, B.N. 2011. "The Difusion of Scientifc Knowledge Across Time and Space:Evidence from Professional Transitions for the Superstars of Medicine." National Bureau of Economic Research.
Azoulay, P., Graff Zivin, J.S., and Wang, J. 2010. "Superstar Extinction," Quarterly Journal of Economics (25), pp 549-589.
Blackwell, M., Iacus, S., King, G., and Porro, G. 2010. "cem: Coarsened Exact Matching in Stata," Stata Journal (9:4), pp 524-546.
Blundell, R., and Bond, S. 1998. "Initial Conditions and Moment Restrictions in Dynamic Panel Data Models," Journal of Econometrics (87:1), pp 115-143.
Bresnahan, T.F., and Greenstein, S. 1996. "Technical Progress and Co-invention in Computing and in the Uses of Computers," Brookings Papers on Economic Activity: Microeconomics), 1996, pp 1-83.
Buell, R.W., Campbell, D., and Frei, F.X. 2010. "Are Self-Service Customers Satisfied or Stuck?," Production and Operations Management (19:6), pp 679-697.
Cameron, A.C., and Trivedi, P.K. 2010. Microeconometrics Using Stata, Revised Edition. College Station, TX: Stata Press.
Casalicchio, E., and Colajanni, M. 2000. "Scalable Web Clusters with Static and Dynamic Contents," IEEE International Conference on Cluster Computing, 2000., pp. 170-177.
Chatterjee, D., Grewal, R., and Sambamurthy, V. 2002. "Shaping Pp for E-Commerce: Institutional Enablers of the Organizational Assimilation of Web Technologies," MIS Quarterly (26:2), pp 65-89.
Chen, H., De, P., and Hu, J. 2015. "IT-Enabled Broadcasting in Social Media: An Empirical Study of Artists’ Activities and Music Sales," Information Systems Research (forthcoming).
Cherkasova, L. 2000. "FLEX: Load Balancing and Management Strategy for Scalable Web Hosting Service," Fifth IEEE Symposium on Computers and Communications (ISCC 2000), Antibes, France, pp. 8-8.
Chwelos, P., Benbasat, I., and Dexter, A.S. 2001. "Research Report: Empirical Test of an EDI Adoption Model," Information Systems Research (12:3), pp 304-321.
Emison, J.M. 2013. "2013 State of Cloud Computing."
28
Fichman, R.G. 2004. "Real Options and IT Platform Adoption: Implications for Theory and Practice," Information Systems Research (15), pp 132-154.
Fichman, R.G., and Kemerer, C.F. 1997. "The Assimilation of Software Process Innovations: An Organizational Learning Perspective," Management Science (43:10), pp 1345-1363.
Fichman, R.G., and Kemerer, C.F. 1999. "The Illusory Diffusion of Innovation: An Examination of Assimilation Gaps," Information Systems Research (10:3), pp 255-275.
Forman, C., Goldfarb, A., and Greenstein, S. 2008. "Understanding the Inputs into Innovation: Do Cities Substitute for Internal Firm Resources?," Journal of Economics & Management Strategy (17:2), pp 295-316.
Frei, F.X. 2008. "The Four Things a Service Business Must Get Right," Harvard Business Review (86:4), pp 70-80.
Furman, J.L., Jensen, K., and Murray, F. 2012. "Governing Knowledge in the Scientific Community: Exploring the Role of Retractions in Biomedicine," Research Policy (41), pp 276-290.
Garcia, D.F., Rodrigo, G., Entrialgo, J., Garcia, J., and Garcia, M. 2008. "Experimental Evaluation of Horizontal and Vertical Scalability of Cluster-based Application Servers for Transactional Workloads," in: 8th International Conference on Applied Informatics and Communications (AIC'08). Rhodes, Greece: World Scientific and Engineering Academy and Society (WSEAS), pp. 29-34.
Ghose, A. 2009. "Internet Exchanges for Used Goods: An Empirical Analysis of Trade Patterns and Adverse Selection," MIS Quarterly (33:2), pp 263-291.
Greene, W.H. 2008. Econometric Analysis, (6th ed.). New Jersey: Pearson Prentice Hall. Hamilton, J.D. 1994. Time Series Analysis. Princeton: Princeton University Press. Hansen, L.P. 1982. "Large Sample Properties of Generalized Method of Moments Estimators,"
Econometrica (50:4), pp 1029-1054. Harms, R., and Yamartino, M. 2010. "The Economics of the Cloud," Microsoft,
http://www.microsoft.com/en-us/news/presskits/cloud/docs/the-economics-of-the-cloud.pdf.
Hitt, L.M., Wu, D.J., and Zhou, X. 2002. "Investment in Enterprise Resource Planning: Business Impact and Productivity Measures," Journal of Management Information Systems (19:1), pp 71-98.
Ho, D., Imai, K., King, G., and Stuart, E. 2007. "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference," Political Analysis (15:3), pp 199-236.
Iacus, S.M., King, G., and Porro, G. 2012. "Causal Inference without Balance Checking: Coarsened Exact Matching," Political analysis (20:1), pp 1-24.
Imbens, G., and Wooldridge, J. 2007. "Control Function and Related Methods," in: NBER Summer Institute - What's New in Econometrics? http://www.nber.org/minicourse3.html.
Ko, D.-G., Kirsch, L.J., and King, W.R. 2005. "Antecedents of Knowledge Transfer from Consultants to Clients in Enterprise System Implementations," MIS Quarterly (29:1), pp 59-85.
Levenshtein, V.I. 1966. "Binary Codes Capable of Correcting Deletions, Insertions, and Reversals," Soviet Physics Doklady (10:8), pp 707-710.
Mell, P., and Grance, T. 2011. "The NIST Definition of Cloud Computing," National Institute of Standards and Technology Information Technology Laboratory (ed.). Gaithersburg, MD.
Michael, M., Moreira, J.E., Shiloach, D., and Wisniewski, R.W. 2007. "Scale-up x Scale-out: A Case Study using Nutch/Lucene," Parallel and Distributed Processing Symposium (IPDPS 2007). IEEE International, pp. 1-8.
Microsoft, and Edge Strategies. 2011. "SMB Cloud Adoption Study Dec 2010 - Global Report," http://www.edgestrategies.com.
Nickell, S. 1981. "Biases in Dynamic Models with Fixed Effects," Econometrica (49:6), pp 1417-1426. Parthasarathy, M., and Bhattacherjee, A. 1998. "Understanding Post-Adoption Behavior in the Context of
Online Services," Information Systems Research (9:4), pp 362-379.
29
Reese, G. 2009. Cloud Application Architectures: Building Applications and Infrastructure in the Cloud. O'Reilly Media.
Retana, G.F., Forman, C., and Wu, D.J. 2015. "Proactive Customer Education, Customer Retention, and Demand for Technology Support: Evidence from a Field Experiment," Manufacturing and Service Operations Management (forthcoming).
Roodman, D. 2009a. "How to Do xtabond2: An Introduction to Difference and System GMM in Stata," Stata Journal (9:1), pp 86-136.
Roodman, D. 2009b. "A Note on the Theme of Too Many Instruments," Oxford Bulletin of Economics & Statistics (71:1), pp 135-158.
SearchDataCenter.com. 2011. "Data Center Decisions 2011 Survey Special Report," TechTarget. Singh, J., and Agrawal, A. 2011. "Recruiting for Ideas: How Firms Exploit the Prior Inventions of New
Hires," Management Science (57:1), pp 129-150. Symantec. 2011. "State of the Cloud Survey," Symantec. Wu, D.J., Ding, M., and Hitt, L.M. 2013. "IT Implementation Contract Design: Analytical and
Experimental Investigation of IT Value, Learning, and Contract Structure," Information Systems Research (24:3), pp 787-801.
Xue, M., and Harker, P.T. 2002. "Customer Efficiency," Journal of Service Research (4:4), pp 253-267. Xue, M., Hitt, L.M., and Chen, P.-Y. 2011. "Determinants and Outcomes of Internet Banking Adoption,"
Management Science (57:2), pp 291-307. Xue, M., Hitt, L.M., and Harker, P.T. 2007. "Customer Efficiency, Channel Usage, and Firm Performance
in Retail Banking," Manufacturing & Service Operations Management (9:4), pp 535-558. Zhu, K., and Kraemer, K.L. 2005. "Post-Adoption Variations in Usage and Value of E-Business by
Organizations: Cross-Country Evidence from the Retail Industry," Information Systems Research (16:1), pp 61-84.
Zhu, K., Kraemer, K.L., and Xu, S. 2006. "The Process of Innovation Assimilation by Firms in Different Countries: A Technology Diffusion Perspective on E-Business," Management Science (52:10), pp 1557-1576.
1
APPENDIX FOR
TECHNOLOGY SUPPORT AND POST-ADOPTION IT SERVICE USE: EVIDENCE FROM THE CLOUD
A.!Summary and Description of Variables
The following is a summary and description of all the variables used throughout the analyses, both as covariates in the regressions as well as criteria for the matching (CEM) process.
Table!A.1.!Description!of!Variables!Variable! Role! Description!!"#$%&',), Metric, Average,GB,of,RAM,memory,used,by,buyer,E,during,month,
7.,*+!"#$%&',),, Dependent,
variable,= ln !"#$%&',) + 1 ,,
4%8R7E$+S8%8**"**',),, Dependent,variable,
Average,proportion,of,servers,run,in,parallel,by,buyer,E,during,month,7.,See,Appendix,I.2,for,construction,details.,
45**678759',), Support,choice,status,indicator,
Indicates,if,full,support,was,adopted,by,buyer,E3by,time,7.,If,buyer,E,used,full,support,for,the,first,time,in,time,period,ê,,then,45**678759',) = 1{)íì}.,
JK$F745**',), Support,choice,indicator,
Indicates,if,full,support,was,adopted,by,buyer,E3at,time,7.,If,buyer,E,used,full,support,for,the,first,time,in,time,period,ê,,then,JK$F745**',) = 1{)>ì}.,
4$%#"%45**678759',), Support,choice,status,indicator,
Indicates,if,buyer,E3stopped,accessing,full,support,(i.e.,,switched,to,basic),by,time,7.,If,buyer,E3switched,from,full,to,basic,support,in,period,w,,then,,4$%#"%45**678759',) = 1{)íñ}.,
@',, Fixed,effect, Buyer,fixed,effect.,One,dummy,per,buyer,E.,A),, Fixed,effect, Calendar,time,fixed,effect.,One,dummy,per,calendar,month,
7,in,the,data.,B',),, Fixed,effect, Tenure,time,fixed,effect.,One,dummy,per,month,in,buyer,E‘s,
tenure,(i.e.,,months,since,adoption,of,cloud,service).,This,is,an,alternative,to,having,a,buyer,tenure,term,(e.g.,,m"+5%"',)),that,allows,to,control,for,the,possibility,that,buyers’,use,of,the,service,may,increase,in,a,nonlinear,fashion,over,time,7.,,
C',),, Error,Term, Assumed,to,be,correlated,only,within,individual,buyers,,but,not,across,them,
45**678759',)I , Instrument, Fitted,value,of,45**678759'),attained,from,probit,models,that,
use,failure<related,variables,as,covariates.,48E*g578h"i',), Instrument,a, Indicates,if,buyer,E,has,suffered,at,least,i,service,outage<
related,failures,by,time,7.,48E*i"7j$%ki',), Instrument,a, Indicates,if,buyer,E,has,suffered,at,least,i,network,outage<
related,failures,by,time,7.,,48E*l$97i',), Instrument,a, Indicates,if,buyer,E,has,suffered,at,least,i,physical,host<
related,failures,by,time,7.,,m%E#"97"%1',), Instrument, Indicates,if,buyer,E,is,in,its,first,trimester,since,adopting,
cloud,service,at,time,7.,a,Please see Appendix G for further details on the construction of all the exogenous failure-related instruments.,
2
Continuation!of!Table!A.1!Variable! Role! Description!ó#F*$&""9', Metric,b, Number,of,employees,at,buyer,firm,E.,*+ó#F*$&#"+7', Metric,b, = ln3(ó#F*$&""9'),,ó#Fò871' , CEM,criterion,b, = 1{ôÉ=�ÑÖÇÇ;Üö?O}.,
Indicates,buyer,has,10,or,fewer,employees.,ó#Fò872' , CEM,criterion,b, = 1{??öôÉ=�ÑÖÇÇ;ÜöyO},,
Indicates,buyer,has,between,11,and,50,employees.,ó#Fò873' , CEM,criterion,b, = 1{y?öôÉ=�ÑÖÇÇ;Üö?OO},,
Indicates,buyer,has,between,51,and,100,employees.,ó#Fò874' , CEM,criterion,b, = 1{?O?öôÉ=�ÑÖÇÇ;Üö^yO},,
Indicates,buyer,has,between,101,and,250,employees.,ó#Fò875' , CEM,criterion,b, = 1{^yOõôÉ=�ÑÖÇÇ;Ü},,
Indicates,buyer,has,more,than,250,employees.,úò_lú', CEM,criterion,b, Indicates,buyer,has,a,High,Uncertainty,Usage,use,case.,úò_ûú', CEM,criterion,b, Indicates,buyer,has,a,Low,Uncertainty,Usage,use,case.,úò_üg', CEM,criterion,b, Indicates,buyer,has,a,Back,Office,Applications,use,case.,úò_lg', CEM,criterion,b, Indicates,buyer,has,a,Hosting,Services,use,case.,úò_m†', CEM,criterion,b, Indicates,buyer,has,a,Test,and,Development,use,case.,a,Please see Appendix F for further details on the construction of all CEM-related variables.,
3
B.!Results Considering Switching to Basic
The objective of this appendix is to assess how our results change if we control for full support
buyers’ downgrade to basic support. Model (1) presented in section 3.1 has covariate
45**678759',) which indicates if full support was adopted by buyer E3by time 7. In other words, if
buyer E used full support for the first time in period ê, then 45**678759',) = 1{7≥ê}. Buyers
accessing full support have the option to downgrade to basic support. Let 4$%#"%45**678759',)
be a binary variable that signals if buyer E does not have access to full support by the end of the
focal month 7 but was using full support at the start of the focal month or in some prior month(s).
In other words, if buyer E switched from full support to basic support in period j, then
4$%#"%45**678759') = 1{7≥j}. We augment Model (1) with the 4$%#"%45**678759',) variable
and have
&',) = 1 + 2?345**678759',) + 2^34$%#"%45**678759',) + :;&',)<;
=
;>?
3+ @' + A) + B',) + C',).
The indexes and fixed effects are the same as in Model (1). We use this model with both
of our dependent variables so &',) = {*+!"#$%&',), 4%8R7E$+S8%8**"*',)}. We are interested in
the difference of the coefficient for 45**678759',) between the models with and without
4$%#"%45**678759',), or 2 − 2?. However, the results must be considered with caution since
the decision to downgrade from full to basic support is potentially endogenous (and we do not
have a valid instrument for it).
The results with *+!"#$%&',) as dependent variable for all subsamples are shown in
Table B.1. Relative the Model (1)’s results in Table 2, all the coefficients for 45**678759',) are
higher once we control for the switching to basic behavior. The difference between them (2 −
2?) is statistically significant at the 1% level for the regressions with all subsamples. This
suggests that the coefficients in Table 2, which capture the average effect of full support, may be
underestimating the effect of full support on IT use.
4
Table!B.1.!Results!for!IT!Use!Considering!Switch!to!Basic!Column! (1)! (2)! (3)! (4)!Sample! Baseline! CEM1! CEM2! CEM3!45**678759',), 0.288***, 0.336***, 0.337***, 0.319***,
(0.009), (0.022), (0.022), (0.036),4$%#"%45**678759',), <0.143***, <0.180***, <0.217***, <0.212***,
(0.019), (0.044), (0.042), (0.044),*+!"#$%&',)<?, 0.968***, 0.953***, 0.957***, 1.003***,
(0.007), (0.022), (0.026), (0.024),*+!"#$%&',)<^, <0.187***, <0.185***, <0.189***, <0.258***,
(0.008), (0.033), (0.040), (0.031),*+!"#$%&',)<b, 0.035***, 0.003, 0.013, 0.038**,
(0.004), (0.018), (0.021), (0.017),Observations, 302,833, 40,671, 31,750, 11,201,Buyers, 20,906, 2,680, 2,023, 687,R2, 0.777, 0.770, 0.785, 0.806,Upgrade,change,("c − 1),,, 33.33%, 39.91%, 40.11%, 37.51%,Dependent variable is *+!"#$%&',). All regressions include monthly calendar (A)) and tenure dummies (B',)). Robust standard errors, clustered on buyers, in parentheses. * F3 < 30.10, ** F3 < 30.05, *** F3 < 30.01.
In Table B.2 we show the results having 4%8R7E$+S8%8**"*',) as dependent variable with
the model that includes the 4$%#"%45**678759',) control. In this case, the coefficients for
45**678759') are very similar to those shown in Table 5. The difference between the coefficients
(2 − 2?) is not statistically significant except in the case in which we use the baseline sample in
column (1) of both tables. Nevertheless, the difference is only significant at the 10% level.
Table!B.2.!Results!for!Efficiency!of!IT!Use!Considering!Switch!To!Basic!Column! (1)! (2)! (3)! (4)!Sample! Baseline! CEM1! CEM2! CEM3!45**678759',), 0.032***, 0.040***, 0.040***, 0.043***,
(0.002), (0.005), (0.006), (0.008),4$%#"%45**678759',), <0.008*, <0.002, <0.002, <0.004,
(0.004), (0.011), (0.012), (0.012),4%8R7E$+S8%8**"*',)<?, 0.897***, 0.886***, 0.881***, 0.903***,
(0.005), (0.014), (0.016), (0.030),4%8R7E$+S8%8**"*',)<^, <0.165***, <0.179***, <0.180***, <0.186***,
(0.004), (0.011), (0.012), (0.023),Observations, 324,406, 43,355, 33,779, 11,888,Buyers, 21,573, 2,684, 2,029, 687,R2, 0.637, 0.614, 0.611, 0.628,Upgrade,change, 2×100 ,,, 3.24, 4.02, 4.00, 4.30,Dependent variable is 4%8R7E$+S8%8**"*',). All regressions include monthly calendar (A)) and tenure dummies (B',)). Robust standard errors, clustered on buyers, in parentheses. * F3 < 30.10, ** F3 < 30.05, *** F3 < 30.01.
5
C.!Provider Cloud Infrastructure Service and Technology Support Offerings
This appendix offers additional details than those presented in section 0 of the manuscript in
relation to the research context and the provider’s service characteristics. In our particular
setting, the cloud provider has recognized that the novelty of the service plus the complexities
involved in deploying distributed architectures that best leverage the cloud’s scalability may pose
significant knowledge barriers to buyers attempting to use the service. In response to this, the
provider offers them the option to contract and access full support. We discuss first the pricing
and terms of the cloud infrastructure service offering, and then elaborate on what characterizes
full support.
One of the essential characteristics of cloud infrastructure services is that they are offered
on-demand (Mell and Grance 2011). Buyers only pay for what they use, and nothing else: there
are no sign-up fees, no minimum spending requirements, no periodical subscription fees and –
since buyers can choose not to use their service as well – there are no contract termination
penalties either. Moreover, in the particular case of our provider, the computing resources are
offered to buyers at fixed hourly rates that increase in server size or capacity, generally in a
linear fashion. Servers’ capacity is defined in terms of memory (GB of RAM), processing power
(number of virtual CPUs), and local storage (GB space of local hard disk). During our
observation period, the three capacity metrics tend to vary together as a bundle, meaning that
more of one is generally associated with more of the other two, yet prices are set and buyers
usually make infrastructure sizing decisions in terms of memory. Prices also vary depending on
the operating system chosen for a server (e.g., Windows servers cost more than Linux servers),
yet such heterogeneity does not alter our main findings (see Appendix D for a detailed analysis).
6
Buyers in our context can launch as many servers and of any size they want, when they
want. However, as is discussed in section 5.2 and in Appendix I, there are important technical
challenges in deploying horizontally scalable configurations where several cloud servers work in
parallel. These challenges may in turn limit buyers’ ability to use many servers at once. Finally,
there are no usage caps, with the only exceptions to this being that the provider may have limited
hardware installed at its data centers or may take security measures to prevent misuse of its
service (e.g., spamming). In other words, for legit buyers, there is no pre-defined cap or limit to
how much they can choose to use the service.
The provider complements its infrastructure offering with full support, which is offered
for a fixed price premium per server-hour used plus an additional fixed monthly fee. For
instance, instead of paying $0.10 per hour for a 2GB RAM Linux server under basic support, a
full support buyer would pay $0.12 more, i.e., $0.22 per hour. Similarly, for the 4GB RAM
server priced at $0.20 per hour under basic support, the full support buyer would pay $0.32 per
hour. The monthly fee is paid as a monthly subscription, which is a fee high enough to deter
buyers with very low willingness to pay (i.e., bloggers that use a single very small server). There
are no sign-up or termination fees for the full support service. The only explicit switching cost
from one support level to another is technical rather than monetary: when downgrading from full
support to basic support, because of technical limitations in the service offering (during our
observation period), buyers must redeploy their servers on their own under the new support
regime. The redeployment will involve launching new servers with virgin operating systems (i.e.,
“out of the box”), and then installing and configuring their business applications on them.
A prime goal of full support is to educate buyers on how to best use the cloud
infrastructure service and adapt it to their idiosyncratic business needs. When receiving full
7
support, buyers receive personalized guidance and training, and thus have the opportunity to
learn directly from the provider’s prior experience in deploying applications in the cloud. Buyers
not willing to pay the price premiums will only receive a basic level of support that has limited
scope in the sense that it is intended to aid buyers with issues concerning account management or
overall performance of the infrastructure service. For example, while a full support buyer may be
personally guided step by step on how to deploy a web server through phone conversations, live
chat sessions or support tickets, basic support buyers will be referred to a knowledge base.
Similarly, if a server failed, which happens much more frequently than in traditional datacenter
settings given the commodity hardware employed and the multi-tenant architecture (i.e., multiple
organizations’ virtual servers are hosted in the same and shared physical server), the provider
would work together with full support buyers in solving the issues, while basic support users
would only be notified about the failure, if anything. Thus, basic support buyers do not have
fluid access to external knowledge from the provider and have to rely mostly on their internal
capabilities to co-produce the service.
8
D.!Implications of Server Operating System Heterogeneity
Throughout our econometric approach, we capture the average effect of full support on buyer
behavior while acknowledging that there may be heterogeneity in that average effect. The buyer
choice of a server’s operating system (OS) is one source of such heterogeneity. In this appendix
we examine how OS choice influences our results.
Even though we observe buyer OS choice, it is difficult to hypothesize how variations in
OS choice influence buyer behavior. With the OS, there is variation in prices as well as in
difficulty of system administration. However, it is unclear ex-ante how this relates to the effect
of provider support on buyer behavior. Furthermore, the observed OS usage is an endogenous
choice for which we do not have an instrument. Given these nuances, we have not explicitly
studied how heterogeneity in OS choice influences full support’s effects on buyer behavior.
Having said that, and despite the fact that we lack a valid instrument for buyer OS choice,
there is value in exploring buyer OS choice heterogeneity. In section D.1 we describe the buyer
preferences regarding which OS they use. We show that most (e.g., 85%) of them tend to use a
single OS throughout their observed tenure; a consequence of this is that our buyer fixed effects
absorb some of the OS-specific heterogeneity. Then, as a robustness check, in section D.2 we run
some variations of our models considering the OS preferences by interacting our support
indicator with OS choice indicators. The results are consistent with our main findings.
D.1.! Buyer OS Preferences
During the time span of our data, the provider offered its servers running 4 different OS and we
observe which OS each individual server used:
1., Linux: Several distributions, although we do not observe which.
2., Windows: Several versions of Windows Server, although we do not observe which.
9
3., Red Hat Enterprise Linux.
4., SQL Server: This is really a Windows Server running SQL Server, yet it was offered under its own price scheme and hence is considered another operating system for this exercise.
Even though there were multiple OS available, as we show in Table D.1, most buyers
either exclusively or at least primarily used a single OS. To determine if a buyer is a user of a
particular OS, we computed the proportion of the total amount of GB RAM-hours consumed by
each buyer over its observed tenure that were consumed with each of the 4 different OS. Then,
using arbitrary yet high thresholds (e.g., from 85% up to 100%), we flag a customer as user of a
certain OS if the proportion of service use with that OS is greater or equal than the defined
threshold. Using these proportions of workloads under each OS and varying thresholds, we
populated each column of Table D.1 as follows:
•, Sample: We show data for two samples, the full baseline sample and the CEM1 subsample, to show that the proportions of buyers using each OS do not vary significantly with the matching process.
•, Threshold: Indicates the percentage of total usage using a specific OS used to flag a buyer as a user of that OS.
•, Linux, Windows, Red Hat and SQL: Indicate the proportion of buyers who used at least as much as the threshold of their total usage under each corresponding OS. For example, 57.35% of buyers in the baseline sample used at least 99% of all their GB RAM-hours on Linux servers.
•, Only 1: Given a certain threshold, it shows the proportion of total buyers that used only a single OS. The column is the sum of the 4 different OS columns to the left.
•, Mixed: Given a certain threshold, it shows the proportion of total buyers that used a mix of more than a single OS. The “Only 1” column and this column add up to 100%.
The main takeaway from Table D.1, and in particular form the “Only 1” column, is that
most customers primarily use a single OS. For instance, 66.96% of buyers in the baseline sample
ran all their servers using a single OS, and 80.13% ran at least 95% of their workloads using a
single OS.
10
Table!D.1.!Proportion!of!Buyers!Using!each!OS!under!Different!Thresholds!
Sample! Threshold!to!Determine!OS!Choice!
Proportion!of!Buyers!using!OS!Linux! Windows! Red!Hat! SQL! Only!1! Mixed!
Baseline,N,=,22,179,
100%, 52.38%, 10.47%, 2.31%, 2.13%, 66.96%, 33.04%,99%, 57.35%, 13.29%, 2.81%, 2.36%, 75.49%, 24.51%,95%, 60.07%, 14.78%, 3.09%, 2.52%, 80.13%, 19.87%,90%, 61.60%, 15.89%, 3.24%, 2.65%, 83.05%, 16.95%,85%, 62.79%, 16.78%, 3.42%, 2.83%, 85.49%, 14.51%,
CEM1,N,=,2,685,
100%, 57.09%, 9.65%, 0.67%, 0.74%, 68.04%, 31.96%,99%, 62.16%, 12.40%, 0.86%, 0.93%, 76.24%, 23.76%,95%, 64.80%, 13.74%, 1.01%, 0.97%, 80.41%, 19.59%,90%, 66.44%, 14.97%, 1.15%, 1.12%, 83.58%, 16.42%,85%, 67.56%, 15.83%, 1.38%, 1.42%, 86.07%, 13.93%,
To put this proportion into perspective, recall the median buyer in the sample consumes
an average of 0.5 GB RAM (or 512 MB RAM) per hour over its tenure. Thus, over a month, a
median buyer consumes 0.53¢ü3£J!/ℎ33×32433ℎ/K8&3×3303K8&9/#$+7ℎ = 3603¢ü3£J!/
#$+7ℎ. If a buyer uses the same OS for at least 95% of its workload, then it is using some other
OS for at most 18 GB RAM during a month. This level of usage is equivalent to running a very
small, 256 MB RAM (0.25 GB RAM) server for 3 days of the month (i.e., 72h). Even for a
threshold of 85%, the remaining 15% is 54 GB RAM during the month, or 9 days of a very small
256 MB RAM server.
We feel that such levels of usage (e.g, a very small server during 9 days per month) are
inconsistent with running production applications, even if they are only used for short time
spans. Even a small blog would run in that 256 MB RAM server but for an entire month (i.e., 30
days), not 9 days, and any standard application will traditionally at least need a 512 MB RAM
server, twice as large as this one. In other words, we are confident that customers who use at
least 85% or more of their workloads on a single OS can be characterized as users of that OS.
We find this is the case for 85.49% of the buyers in the baseline sample and 86.07% of the
buyers in the CEM1 subsample.
11
D.2.! Differentiated Technology Support Effects
Using the 85% of usage threshold to flag a customer as a specific OS user, Table D.1 shows that
the majority of buyers, between 63% and 68%, primarily use Linux as their OS. The next most
popular OS is Windows, which is used by between 16% and 17% of the buyers. Red Hat and
SQL, together, account for only between 3% and 6% of the buyer base. Finally, only around 14%
of the buyers use a mix of OS.
We next explore if full support has different effects on buyer behavior conditional on the
OS being used by each buyer. We interact the support choice indicator in Model (1),
45**678759',), with the OS choice indicators and include the interaction terms in our model. Let
g6', g6' = {ßE+', £l', 6®û',!E©'} indicate if a buyer consumed at least 85% of its workload
using Windows, Red Hat, SQL Server, or a mix of OS, respectively. Since the majority of users
employ Linux, we leave this group excluded from our vector of dummies. Then, we interact
these indicators with our support choice indicators as follows:
&',) = 1 + 2?345**678759',) + 2™´345**678759',)×g6' + :;&',)<;
=
;>?
+ @' + A) + B',) + C',),
where &',) = {*+!"#$%&',), 4%8R7E$+S8%8**"*',)}, and g6' = ßE+', £l', 6®û',!E©' .
Coefficient 2? identifies the effects of full support on Linux users, while the sum 2? + 2™´ does
the same of the users of each individual OS. The model does not include stand-alone OS
indicators (e.g., ßE+') because such non-time-varying attributes are already considered in the
buyer fixed effects (@').
The results for the regression with the interaction terms are shown in Table D.2. The first
four columns show the results for *+!"#$%&',) while the remaining ones do the same for
4%8R7E$+S8%8**"*',). We include results using both the baseline and the CEM1 samples, to show
the matching procedures does not significantly alter our results. In the odd-numbered columns
12
we show the results without the OS-specific interaction terms, as they appear in the manuscript,
so that they can be compared to the results with the interaction terms in the even-numbered
columns. Finally, in the lower rows of the table we show the percentage changes in *+!"#$%&',)
and the discrete changes in 4%8R7E$+S8%8**"*',) that occur as a consequence of accessing full
support (FS) using each different OS. The stars (*) in these rows correspond to the p-values of
the tests of the sum of the corresponding coefficients (i.e., 2? + 2™´) being equal to zero.
Table!D.2.!Results!with!OS5Specific!Interaction!Terms!! (1)! (2)! (3)! (4)! (5)! (6)! (7)! (8)!Dependent!Variable! w[,X = Vstuvopw[,X! w[,X = TpYzX[os{YpYVVuV[,X!Sample! Baseline! CEM! Baseline! CEM1!45**678759',), 0.276***, 0.255***, 0.319***, 0.286***, 0.032***, 0.032***, 0.040***, 0.039***,
(0.009), (0.011), (0.021), (0.024), (0.002), (0.003), (0.005), (0.007),45**678759',), , <0.015, , 0.107, , <0.009, , 0.002,,,,×ßE+', , (0.030), , (0.070), , (0.006), , (0.014),45**678759',), , 0.163***, , 0.222***, , 0.019, , 0.042,,,,×£l', , (0.039), , (0.083), , (0.012), , (0.041),45**678759',), , 0.141**, , 0.195***, , <0.013, , <0.011,,,,×6®û', , (0.055), , (0.063), , (0.012), , (0.027),45**678759',), , 0.053***, , 0.042, , <0.001, , 0.000,,,,×!E©', , (0.017), , (0.045), , (0.005), , (0.011),&',)<?, 0.969***, 0.968***, 0.955***, 0.954***, 0.897***, 0.897***, 0.886***, 0.886***,, (0.007), (0.007), (0.022), (0.022), (0.005), (0.005), (0.014), (0.014),&',)<^, <0.187***, <0.187***, <0.186***, <0.185***, <0.165***, <0.165***, <0.179***, <0.179***,, (0.008), (0.008), (0.034), (0.034), (0.004), (0.004), (0.011), (0.011),&',)<b, 0.035***, 0.035***, 0.003, 0.003, , , , ,, (0.004), (0.004), (0.018), (0.018), , , , ,Observations, 302,833, 302,621, 40,671, 40,654, 324,406, 324,176, 43,355, 43,337,Buyers, 20,906, 20,892, 2,680, 2,679, 21,573, 21,555, 2,684, 2,683,R<Squared, 0.776, 0.776, 0.770, 0.770, 0.637, 0.637, 0.614, 0.614,FS,Change,(Linux), 31.84%***, 29.08%***, 37.59%***, 33.17%***, 3.19***, 3.23***, 4.00***, 3.87***,FS,Change,Windows, , 27.21%***, , 48.23%***, , 2.36***, , 4.05***,FS,Change,Red,Hat, , 51.97***, , 66.21***, , 5.18***, , 8.05**,FS,Change,SQL, , 48.65***, , 61.90***, , 1.89, , 2.81,FS,Change,Mixed, , 36.11***, , 38.88***, , 3.16***, , 3.87***,All regressions include monthly calendar (A)) and tenure dummies (B',)). Robust standard errors, clustered on buyers, in parentheses. * F3 < 30.10, ** F3 < 30.05, *** F3 < 30.01.
The results in Table D.2 are generally consistent with our main findings. However, we refrain
from a detailed analysis of the coefficients since OS choice is an endogenous variable for which
we do not have an instrument.
13
E.!Description of Origin of Survey Data
The survey is optional and administered as part of the online signup web form; the response rate
is 43.4%, and we have not found systematic differences between respondents and non-
respondents. The survey was first administered in June 2010, and we have all buyers’ responses
until February 2012. Although there can only be one survey response per account, since buyers
can have multiple accounts, we may also have multiple responses per buyer. In our data we have
6,152 survey responses from 5,565 different buyers in the baseline sample, 431 of which
changed their response to at least one item across their surveys. However, for 42.3% of the
buyers with varying responses the time gap between the survey responses is too short (i.e., less
than 3 months) as to suggest that the variance is due to changes in firms’ sizes or goals. Given
this, we do not rely on variance across responses for our analysis and rather only consider the
5,134 buyers that either have a single survey response or that have consistent responses across all
their submissions. Further, we have not considered firm attributes in the survey as controls in our
models since they do not vary over time and thus would be absorbed by the firm fixed effect. We
use 3 of the items in the survey: the firms’ total employment, their intended use case for the
cloud infrastructure service, and their industry. For employment, the survey asks buyers to
indicate their range of employment and we convert the survey’s ranges to numerical values by
taking the mean value of each range (e.g., we convert “From 51 to 100” to 75).
14
F.!Description of CEM Procedures and Subsamples
F.1.! Overview of CEM Procedure
We run our models on subsamples defined using a coarsened exact matching (CEM) procedure
(Blackwell et al. 2010; Iacus et al. 2012). For matching purposes, we consider buyers who
adopted full support at any point in their tenure as treated and those that relied exclusively on
basic support as controls. Matching reduces endogeneity concerns (Ho et al. 2007), and CEM has
been used extensively in recent work to improve the identification of appropriate control groups
in difference-in-differences estimation (e.g., Azoulay et al. 2011; Azoulay et al. 2010; Furman et
al. 2012).
CEM is particularly convenient for our setting because it is a nonparametric procedure
that does not require the estimation of propensity scores. This is useful because, aside from the
exogenous failures, we have limited data that would allow us to directly predict the likelihood of
full support. Each unique vector formed by combinations of the coarsened covariates describes a
stratum. Since the number of treated and control observations in each strata may be different,
observations are weighted according to the size of their strata (Iacus et al. 2012). The differences
in means between the treated and the controls across the various matching variables are almost
all statistically significant. However, the samples are perfectly balanced and any mean
differences are eliminated once we apply the CEM weights (see Table F.8 for descriptive
statistics with weights applied). All our regressions with CEM-based subsamples employ these
weights. When exact matching is possible, such that for every treated observation there is a
control observation identical to the first one across all possible covariates except for the
treatment, a simple difference in means of the dependent variables would provide an estimate of
the causal effect of interest. Nonetheless, since it is nearly impossible to use exact matching in
15
observational data and thus there is always a concern about the influence of omitted variables,
we continue using our fixed effects panel data model to control for them.
We match buyers based on five attributes: (1) level of IT use (i.e., memory use), (2)
frequency of cloud infrastructure resizing (i.e., how often buyers launch a server, halt a server, or
resize an existing one), (3) employment, (4) intended use case for the cloud infrastructure
service, and (5) industry. The first two attributes are derived directly from firms’ observed usage
of the cloud service. The latter three attributes come from the optional signup survey described in
Appendix E. The precise matching criteria are described below in section 0.
For the matching process, we only consider treated buyers who started using the cloud
service with basic support and upgraded to full support later on. This allows us to match the
upgraders to controls based on their usage behavior before they adopted full support, had the
controls adopted full support in the same month of their tenure. This approach, which is similar
to the one implemented by Azoulay et al. (2010) and Singh and Agrawal (2011), ensures to the
extent possible that treated firms do not exhibit differential usage behavior before they adopt full
support relative to controls. Among the 5,134 buyers for which we have all this data (i.e., they
answered the signup survey), 1,259 are treated and 3,875 are potential controls. Using the five
criteria described above, we develop 3 different weighted matched subsamples. Each of the three
subsamples is built with increasingly stringent matching criteria, which in turn produces finer
strata (i.e., less buyers satisfy the matching criteria used) and reduces our concerns of having too
coarse strata. The details of the subsamples construction are offered in section F.3.
16
F.2.! CEM Matching Criteria
Five different attributes of firms were used to match treated and controls. In this section we
describe each of them as well as the binning done within each attribute. Their corresponding
descriptive statistics are shown in Table F.1.
Table!F.1.!Descriptive!Statistics!of!Variables!used!for!CEM!before!matching!(5,134!buyers) Buyer Role All buyers Controls Treated Number of Buyers 5,134 3,875 1,259
Variable Mean S.D. Min Max Mean S.D. Min Max Mean S.D. Min Max ó#F*$&""9' 195.7 1,102.4 2 10,000 164.7 1019.9 2 10,000 291.0 1320.3 2 10,000 *+ó#F*$&#"+7' 2.402 1.706 1.099 9.21 2.26 1.608 1.099 9.21 2.838 1.914 1.099 9.21 ó#Fò871' 0.656 0.475 0 1 0.692 0.462 0 1 0.546 0.498 0 1 ó#Fò872' 0.198 0.398 0 1 0.187 0.390 0 1 0.230 0.421 0 1 ó#Fò873' 0.050 0.218 0 1 0.044 0.204 0 1 0.071 0.256 0 1 ó#Fò874' 0.037 0.188 0 1 0.030 0.171 0 1 0.056 0.231 0 1 ó#Fò875' 0.060 0.237 0 1 0.047 0.213 0 1 0.097 0.296 0 1 úò_lú' 0.463 0.499 0 1 0.469 0.499 0 1 0.447 0.497 0 1 úò_ûú' 0.591 0.492 0 1 0.573 0.495 0 1 0.647 0.478 0 1 úò_üg' 0.189 0.391 0 1 0.195 0.396 0 1 0.169 0.375 0 1 úò_lg' 0.092 0.289 0 1 0.093 0.290 0 1 0.088 0.284 0 1 úò_m†' 0.293 0.455 0 1 0.323 0.468 0 1 0.203 0.402 0 1
IT Use and Frequency of Infrastructure Resizing: In regards to overall use (i.e., memory use)
and frequency of infrastructure resizing, when creating our baseline sample we had already
discarded basic support users with very small and/or rather static deployments over the early
periods of their tenure. We excluded buyers who averaged 512 MB RAM/hour or less during
their first 6 months (excluding 1st month) or made no adjustments to size of their infrastructure
during their first 6 months (excluding 1st month). Nonetheless, even among the remaining buyers
there is considerable variation in these two variables.
The average memory usage and the frequency of infrastructure resizing actions used to
match treated and controls were computed as follows. Assume that a given treated buyer adopted
the service in some period 7O and switched from basic to full support in a later time period, 7I;,
7I; > 7O. Then, we consider the set of controls (i.e., buyers who exclusively used basic support)
17
who also adopted the service in month 7O and used the service (i.e., did not churn) at least up to
7I;. This ensures all buyers were using the service during the same calendar time frame and have
very similar tenure by period 7I;. For the treated group and all these controls, we compute the
average memory usage and frequency of scaling actions in the periods during which all buyers
were using basic support: from 7O up to 7I;<?. Finally, we use this metric, which represents their
pre-upgrade behavior, to match buyers.
For average memory usage, we set our cutoff points at standard server sizes: 512MB,
1GB, 2GB, 4GB, 8GB, 16GB, 32GB and 64GB of RAM. For frequency of infrastructure
resizing we base our cutoff points on percentiles of the distribution: the 25th percentile is a single
change to the size of the deployment, the 50th percentile is 3 changes, the 75th percentile is 9
changes, and the 95th percentile is 43 changes. In total, we have 9 categories of memory usage
and 5 categories of frequency of infrastructure resizing to match on.
Employment: The employment, intended use case, and industry data are all collected from the
signup survey. For the employment cutoff points, we broadly rely on the ranges used in the
survey. Among the buyers with consistent survey responses across all their accounts, 66%
indicated they have 10 or fewer employees (ó#Fò871'), so we use 10 as our first cutoff point.
Another 20% indicated they have between 11 and 50 employees (ó#Fò872'), making this our
next cutoff point. We subdivide the remaining 15% of buyers in three bins each accounting for
roughly 5% of our sample: from 51 to 100 (ó#Fò873'), from 101 to 250 (ó#Fò874'), and
greater than 250 (ó#Fò875'). Detailed descriptive statistics of each category (e.g., ó#Fò87i')
are shown in Table F.1.
18
Intended Use Case: The intended use case is collected by a multiple choice question (i.e.,
“Mark all that apply”) that asked buyers to “Please indicate what solution(s) you intend to use
[the cloud infrastructure service] for.” The 20 options available to buyers are very specific, and
finding matches across such specific use cases would be extremely hard. Instead, we group the
specific use cases into 3 more general use cases based on two dimensions: if the use case is
related to back office or front office applications, and, in the latter case, if it is likely that the
volume of usage for the use case is predictable or not. Our first general use case, which we call
“High Usage Uncertainty” (úò_lú'), includes customer-facing websites that are prone to
unpredictable variance in their volume of usage. Examples of such use cases are social media
sites, online gaming sites, online publishing sites, rich media sites (e.g., audio or video), and
other Software-as-a-Service (SaaS) offerings. Our second general use case, “Low Usage
Uncertainty” (úò_ûú'), includes customer-facing websites used for regular operation of the firm
that have steady or at least predictable use levels. Examples are corporate websites, collaboration
platforms, online portals, and e-commerce sites. We chose to include e-commerce sites in this
general use case since, although it may have a high variance, seasonality makes the peaks and
valleys of the demand fairly predictable. Finally, our “Back Office Applications” general use
case (úò_üg'), includes applications or systems used internally for business operations.
Examples are a company’s intranet and systems used for accounting, customer relationship
management, human resources, supply chain management, or backup. We additionally consider
web hosting services (úò_lg') and running test and development environments (UC_m†Ø) as
additional general use cases. Altogether, we have 5 general use cases, and the proportion of firms
that marked each of them is shown in Table F.1.
Industry: Finally, we incorporate an additional question on buyers’ industries in the survey to
make an even more stringent match of treated buyers to controls. Although the survey item does
19
not follow any standard industry categorization (e.g., NAICS or SIC codes), it does provide
information on buyers’ broad industries. The most popular industries are IT services (15.75%),
web development or design (11.11%), software (10.67%), e-commerce (9.01%), consulting
(5.60%), SaaS (5.32%), advertising (5.56%), and entertainment (3.75%). This field also allows
respondents to enter free text, which highly increases the number of categories that can be used
for matching; there are over 280 different industries in the data.
F.3.! CEM Subsamples
We work with several subsamples dependent on data availability and the application of the
Coarsened Exact Matching (CEM) procedure. They are listed and described in Table F.2. The
“Full” sample constitutes our entire dataset, without any buyers excluded. For the “Baseline”
subsample we have excluded buyers who (1) only accessed basic support and (2) averaged 512
MB RAM/hour or less during their first 6 months (excluding 1st month) or (3) made no
adjustments to size of their infrastructure during their first 6 months (excluding 1st month). An
infrastructure resizing occurs on any launching, halting, or resizing of a server in the buyers’
cloud infrastructure. We do not consider their behavior during their 1st month in our threshold
because most buyers are setting up their infrastructure during this time. All other subsamples are
subsets of the “Baseline” subsample. We only have visibility into the buyers’ support
interactions with the provider starting on October 2009, which constrains our “Support”
subsample.
The CEM1, CEM2 and CEM3 subsamples correspond to the outcomes of the matching
process with increasingly stringent criteria, as shown in Table F.3. The actual outcomes (e.g.,
number of treated and controls per sample) are shown in Table F.4
The descriptive statistics of all the aforementioned are included in section F.4.
20
Table&F.2.&General&Description&of&Subsamples&Sample Full Baseline Support CEM1 CEM2 CEM3 Buyers in Sample 79,619 22,179 20,040 2,685 2,029 687 Panel Start March’09 March’09 Oct’09 Oct’09 Oct’09 Oct’09 Panel End Aug’12 Aug’12 Aug’12 Aug’12 Aug’12 Aug’12 Panel Length (months) 42 42 35 35 35 35 Buyer-month observations 1,073,998 368,606 298,539 48,725 37,837 13,262 Buyers support choice behavior
Only use basic support 73,594 16,157 14,338 2,365 1,732 526 Start with basic, upgrade to full support 1,409 1,408 1,132 275 258 136 Start with basic, upgrade to full, and downgrade to basic 205 203 159 45 39 25 Start with full, downgrade to basic 215 215 215 Excluded Excluded Excluded Only full support 4,196 4,196 4,196 Excluded Excluded Excluded
Data components available Cloud infrastructure usage and support choice data Yes Yes Yes Yes Yes Yes Survey data used for CEM Incomplete Incomplete Incomplete Yes Yes Yes Support interaction data used to construct IVs Incomplete Incomplete Yes Yes Yes Yes
CEM procedure applied? No No No Yes Yes Yes
Table&F.3.&Description&of&Matching&Criteria&used&in&CEM&Procedures&Abbreviation Description # of Categories Categories Emp Employment 5 0-10, 11-50, 51-100, 101-250, >250 UC General use cases (can have more than 1) 5 High variance, low variance, back office, hosting, test & dev Mem Memory usage in months before upgrade 9 <0.5, 0.5-1, 1-2, 2-4, 4-8, 8-16, 16-32, 32-64, >64 Adj Frequency of infrastructure resizing in months before upgrade 5 0, 1-2, 3-9, 10-43, >43 Ind Industries 258 Popular ones have 11% to 15% of observations t-upg Upgrade month for treated, and month in tenure for controls 40 One per month; longest delay in upgrading is 40 months.
Table&F.4.&&Coarsened&Exact&Matching&(CEM)&Procedure&Outcomes&
Subsample Strata w/cust.
Strata w/matches
Matched Buyers / Total Buyers Average Buyers per Stratum Variables used for matching Controls Treated Both Controls Treated Both Emp UC Mem Adj Ind t-upg
CEM1 11,876 294 2365/3800 320/ 400 2685/4200 8.0 1.1 9.1 Yes Yes Yes Yes No Yes CEM2 22,268 284 1732/3800 297/ 400 2029/4200 6.1 1.0 7.1 Yes Yes No Yes Yes Yes CEM3 36,908 157 526/3800 161/ 400 687/4200 3.4 1.0 4.4 Yes Yes Yes Yes Yes Yes
21
F.4.& Descriptive Statistics of Subsamples
Table&F.5.&&Descriptive&Statistics&of&TimeGVarying&Variables&Sample Full Baseline Support CEM1 CEM2% CEM3%Buyers 79,619 22,179 20,040 2,685 2,029 687 Observations 1,073,998 368,606 298,539 48,725 37,837 13,262 Variable Mean S.D. Min Max Mean S.D. Min Max Mean S.D. Min Max Mean S.D. Min Max Mean S.D. Min Max Mean S.D. Min Max !"#$%&',) 3.4 19.2 0 2,284.5 7.9 31.4 0.0 2,284.5 7.3 27.4 0.0 2,284.5 5.2 17.2 0.0 675.9 6.7 22.1 0.0 675.9 5.1 12.1 0.0 329.0 *+!"#$%&',) 0.746 0.871 0 7.734 1.348 1.040 0 7.734 1.343 1.014 0 7.734 1.218 0.894 0 6.518 1.302 0.986 0 6.518 1.221 0.920 0 5.799 ,-**.-//$%0',) 0.055 0.228 0 1 0.160 0.367 0 1 0.183 0.387 0 1 0.078 0.268 0 1 0.093 0.291 0 1 0.155 0.362 0 1 .1203ℎ5$67823',) 0.003 0.052 0 1 0.008 0.089 0 1 0.009 0.093 0 1 0.007 0.082 0 1 0.007 0.084 0 1 0.013 0.113 0 1 ,%7302$+97%7**"*',)%0.058 0.198 0 1 0.121 0.266 0 1 0.120 0.267 0 1 0.106 0.251 0 1 0.116 0.258 0 1 0.104 0.246 0 1
Table&F.6.&&Descriptive&Statistics&of&Variables&used&in&Survey&Data&for&CEM&before&matching&(5,134&buyers)&Buyer Role All Buyers Controls Treated t-test of mean
difference Number of Buyers 5,134 3,875 1,259 Variable Description Mean S.D. Min Max Mean S.D. Min Max Mean S.D. Min Max Diff. p-value :#/*$&""8' Number of employees 195.7 1,102.4 2 10,000 164.7 1019.9 2 10,000 291.0 1320.3 2 10,000 126.3 0.000 *+:#/*$&#"+0' *+ :#/*$&""8' 2.402 1.706 1.099 9.21 2.26 1.608 1.099 9.21 2.838 1.914 1.099 9.21 0.578 0.000 :#/;701' 1{>?@ABCDDEFGHI} 0.656 0.475 0 1 0.692 0.462 0 1 0.546 0.498 0 1 -0.146 0.000 :#/;702' 1{HHGL>?@ABCDDEFGMI} 0.198 0.398 0 1 0.187 0.390 0 1 0.230 0.421 0 1 0.044 0.001 :#/;703' 1{MHGL>?@ABCDDEFGHII} 0.050 0.218 0 1 0.044 0.204 0 1 0.071 0.256 0 1 0.027 0.000 :#/;704' 1{HIHGL>?@ABCDDEFGPMI} 0.037 0.188 0 1 0.030 0.171 0 1 0.056 0.231 0 1 0.026 0.000 :#/;705' 1{PMIRL>?@ABCDDEF} 0.060 0.237 0 1 0.047 0.213 0 1 0.097 0.296 0 1 0.049 0.000 S;_US' 1{V'WXLYZ[D\)]'Z)CLY^} 0.463 0.499 0 1 0.469 0.499 0 1 0.447 0.497 0 1 -0.021 0.185 S;__S' 1{`BaLYZ[D\)]'Z)CLY^} 0.591 0.492 0 1 0.573 0.495 0 1 0.647 0.478 0 1 0.073 0.000 S;_6b' 1{c][dLeff'[DLY^} 0.189 0.391 0 1 0.195 0.396 0 1 0.169 0.375 0 1 -0.026 0.043 S;_Ub' 1{VBE)'ZWLY^} 0.092 0.289 0 1 0.093 0.290 0 1 0.088 0.284 0 1 -0.005 0.613 S;_5g' 1{hDE)L&LjDkDAB@?DZ)LY^} 0.293 0.455 0 1 0.323 0.468 0 1 0.203 0.402 0 1 -0.120 0.000
22
Table&F.7.&&Descriptive&Statistics&of&Variables&in&CEM1&Matched&Sample&without&Weights&(2,685&buyers)&Buyer Role All Buyers Controls Treated t-test of mean
difference Number of Buyers 2,685 2,365 320 Variable Description Mean S.D. Min Max Mean S.D. Min Max Mean S.D. Min Max Diff. p-value :#/*$&""8' Number of employees 110.2 830.8 2 10,000.00 94.5 779.7 2 10,000.00 226.4 1134.3 2 10,000.00 131.8 0.008 *+:#/*$&#"+0' *+ :#/*$&""8' 1.974 1.372 1.099 9.21 1.901 1.292 1.099 9.21 2.513 1.772 1.099 9.21 0.612 0.000 :#/;701' 1{>?@ABCDDEFGHI} 0.780 0.415 0 1 0.798 0.401 0 1 0.641 0.481 0 1 -0.158 0.000 :#/;702' 1{HHGL>?@ABCDDEFGMI} 0.151 0.358 0 1 0.145 0.352 0 1 0.200 0.401 0 1 0.055 0.009 :#/;703' 1{MHGL>?@ABCDDEFGHII} 0.026 0.159 0 1 0.022 0.145 0 1 0.059 0.237 0 1 0.038 0.000 :#/;704' 1{HIHGL>?@ABCDDEFGPMI} 0.011 0.105 0 1 0.010 0.098 0 1 0.022 0.147 0 1 0.012 0.052 :#/;705' 1{PMIRL>?@ABCDDEF} 0.032 0.176 0 1 0.026 0.159 0 1 0.078 0.269 0 1 0.052 0.000 S;_US' 1{V'WXLYZ[D\)]'Z)CLY^} 0.495 0.500 0 1 0.495 0.500 0 1 0.491 0.501 0 1 -0.005 0.880 S;__S' 1{`BaLYZ[D\)]'Z)CLY^} 0.598 0.490 0 1 0.595 0.491 0 1 0.622 0.486 0 1 0.027 0.349 S;_6b' 1{c][dLeff'[DLY^} 0.093 0.291 0 1 0.089 0.285 0 1 0.125 0.331 0 1 0.036 0.036 S;_Ub' 1{VBE)'ZWLY^} 0.034 0.181 0 1 0.029 0.167 0 1 0.072 0.259 0 1 0.043 0.000 S;_5g' 1{hDE)L&LjDkDAB@?DZ)LY^} 0.236 0.425 0 1 0.234 0.423 0 1 0.253 0.435 0 1 0.019 0.446
Table&F.8.&&Descriptive&Statistics&of&Variables&in&CEM1&Matched&Sample&with&Weights&(2,685&buyers)&Buyer Role All Buyers Controls Treated t-test of mean
difference Number of Buyers 2,685 2,365 320 Variable Description Mean S.D. Min Max Mean S.D. Min Max Mean S.D. Min Max Diff. p-value :#/*$&""8' Number of employees 226.8 1135.0 2 10,000.00 278.1 1368.9 2 10,000.00 226.4 1134.3 2 10,000.00 -51.7 0.827 *+:#/*$&#"+0' *+ :#/*$&""8' 2.512 1.771 1.099 9.21 2.478 1.837 1.099 9.21 2.513 1.772 1.099 9.21 0.034 0.926 :#/;701' 1{>?@ABCDDEFGHI} 0.641 0.480 0 1 0.641 0.480 0 1 0.641 0.481 0 1 0.000 1.000 :#/;702' 1{HHGL>?@ABCDDEFGMI} 0.200 0.400 0 1 0.200 0.400 0 1 0.200 0.401 0 1 0.000 1.000 :#/;703' 1{MHGL>?@ABCDDEFGHII} 0.059 0.236 0 1 0.059 0.236 0 1 0.059 0.237 0 1 0.000 1.000 :#/;704' 1{HIHGL>?@ABCDDEFGPMI} 0.022 0.146 0 1 0.022 0.146 0 1 0.022 0.147 0 1 0.000 1.000 :#/;705' 1{PMIRL>?@ABCDDEF} 0.078 0.268 0 1 0.078 0.268 0 1 0.078 0.269 0 1 0.000 1.000 S;_US' 1{V'WXLYZ[D\)]'Z)CLY^} 0.491 0.500 0 1 0.491 0.500 0 1 0.491 0.501 0 1 0.000 1.000 S;__S' 1{`BaLYZ[D\)]'Z)CLY^} 0.622 0.485 0 1 0.622 0.485 0 1 0.622 0.486 0 1 0.000 1.000 S;_6b' 1{c][dLeff'[DLY^} 0.125 0.331 0 1 0.125 0.331 0 1 0.125 0.331 0 1 0.000 1.000 S;_Ub' 1{VBE)'ZWLY^} 0.072 0.258 0 1 0.072 0.258 0 1 0.072 0.259 0 1 0.000 1.000 S;_5g' 1{hDE)L&LjDkDAB@?DZ)LY^} 0.253 0.435 0 1 0.253 0.435 0 1 0.253 0.435 0 1 0.000 1.000
23
G.!Support Interactions and Construction of Instruments
G.1.! Support Interactions Coding Process
The content of the support interactions between the provider and its buyers was used to identify
three types of exogenous failures experienced by buyers. The following are the keywords and
phrases used to identify each of these types of interactions. All support interactions that matched
some keyword or phrased were visually examined to rule out false positives.
!Table!G.1.!!Keywords!and!Phrases!Searched!for!Support!Interactions!Coding!Support Interaction Type Description of Event List of keywords or phrases
!"#$%&'"()
Provider may suffer from generalized outages in different components of its service (e.g., memory leak in provider’s cloud management system). Such generalized problems are announced in the provider’s status webpage and/or announced to buyers.
Providers’ service status URL, cloud status, outage, scheduled maintenance, undergoing maintenance
!"#$*)'+,-.
Some node in the provider’s infrastructure, generally belonging to some buyer, is suffering from a distributed denial of service attack (DDoS) or some networking hardware device has failed.
Server does not respond to ARP requests, faulty switch, network issue in our data center, lb in error state, load-balancer hardware nodes, DDoS
!"#$/,0'
Buyer is suffering degraded performance due to a problem in the physical host in which the buyer’s virtual machine runs. Problems are generally associated with excessive read/write operations on the hard disks, either by the buyer or by another buyer whose virtual machine lives in same physical server. Problems could also be associated with failure of the physical hardware.
Consuming a significant amount of Disk I/O, very high disk I/O usage, iowait, iostat, swapping, swappers, swap space, extreme slowness, slowdown problems, hardware failure, degraded hardware, drive failing, drives failing, server outage, host failure, server is down, server down, site down, host became unresponsive, server unresponsive, server not responding, server is unresponsive, is hosted on has become unresponsive, problem with our server, host server, physical host, physical hardware, physical machine, host machine, failing hardware, hardware failure, imminent hardware issues, migrate your cloud server to another host, queued for move, issue on the migrations, host server of your cloud servers
24
G.2.! Construction of Support-Based Variables
Let ! ∈ !"#$%&'"(), !"#$*)'+,-., !"#$/,0' represent a type of support interaction
identified through coding process. Let *&3!4,5 be the number of support interactions of type !
counted for buyer # during month '. Further, let 677!4,5 be the accumulated number of support
interactions of type ! that buyer # has experienced up to month '. Formally, 677!4,5 =
*&3!4,99:59:; . Finally, we construct indicators that are turned on when the total number of
interactions is greater than * = 1,2, as !*4,5 = 1{?@@AB,CDE}. Then, for example, variable
!"#$%&'"()24,5 will be equal to 1 if buyer # has accumulated at least 2 support interactions that
have been coded as type !"#$%&'"() by month '.
25
G.3.! Descriptive Statistics of Support Interactions
Table!G.2.!!Descriptive!Statistics!of!Support!Interactions8based!Variables!(Support!and!CEM1!subsamples) Subsample Support CEM1 Buyers in Sample (Obs.) 20,040 (N=298,539) 2,685 (N=48,725) Variable Mean S.D. Min Med 75th p. 90th p. 95th p. 99th p. Max Mean S.D. Min Med 75th p. 90th p. 95th p. 99th p. Max !"#$%&'(")%*+,,. 0.009 0.123 0 0 0 0 0 0 9 0.006 0.090 0 0 0 0 0 0 4 /00$%&'(")%*+,,. 0.067 0.409 0 0 0 0 0 2 18 0.045 0.300 0 0 0 0 0 1 8 $%&'(")%*+1,,. 0.044 0.205 0 0 0 0 0 1 1 0.031 0.174 0 0 0 0 0 1 1 $%&'(")%*+2,,. 0.012 0.109 0 0 0 0 0 1 1 0.008 0.089 0 0 0 0 0 0 1 !"#$%&'!+)3456,,. 0.002 0.059 0 0 0 0 0 0 10 0.001 0.039 0 0 0 0 0 0 3 /00$%&'!+)3456,,. 0.017 0.172 0 0 0 0 0 1 12 0.01 0.108 0 0 0 0 0 0 3 $%&'!+)34561,,. 0.013 0.114 0 0 0 0 0 1 1 0.009 0.095 0 0 0 0 0 0 1 $%&'!+)34562,,. 0.002 0.046 0 0 0 0 0 0 1 0.001 0.025 0 0 0 0 0 0 1 !"#$%&'748),,. 0.024 0.197 0 0 0 0 0 1 21 0.020 0.170 0 0 0 0 0 1 11 /00$%&'748),,. 0.239 0.96 0 0 0 1 1 3 114 0.188 0.775 0 0 0 1 1 3 30 $%&'748)1,,. 0.146 0.354 0 0 0 1 1 1 1 0.118 0.323 0 0 0 1 1 1 1 $%&'748)2,,. 0.043 0.204 0 0 0 0 0 1 1 0.031 0.174 0 0 0 0 0 1 1
!
Table!G.3.!!Descriptive!Statistics!of!Support!Interactions8based!Variables!(CEM2!and!CEM3!subsamples) Subsample CEM2 CEM3 Buyers in Sample 2,029 (37,837) 687 (13,262) Variable Mean S.D. Min Med 75th p. 90th p. 95th p. 99th p. Max Mean S.D. Min Med 75th p. 90th p. 95th p. 99th p. Max !"#$%&'(")%*+,,. 0.007 0.103 0 0 0 0 0 0 4 0.011 0.127 0 0 0 0 0 0 4 /00$%&'(")%*+,,. 0.057 0.348 0 0 0 0 0 2 8 0.072 0.414 0 0 0 0 0 2 8 $%&'(")%*+1,,. 0.038 0.191 0 0 0 0 0 1 1 0.045 0.207 0 0 0 0 0 1 1 $%&'(")%*+2,,. 0.011 0.104 0 0 0 0 0 1 1 0.014 0.119 0 0 0 0 0 1 1 !"#$%&'!+)3456,,. 0.001 0.040 0 0 0 0 0 0 2 0.001 0.042 0 0 0 0 0 0 2 /00$%&'!+)3456,,. 0.011 0.111 0 0 0 0 0 1 3 0.013 0.117 0 0 0 0 0 1 2 $%&'!+)34561,,. 0.010 0.101 0 0 0 0 0 1 1 0.012 0.109 0 0 0 0 0 1 1 $%&'!+)34562,,. 0.001 0.026 0 0 0 0 0 0 1 0.001 0.025 0 0 0 0 0 0 1 !"#$%&'748),,. 0.022 0.182 0 0 0 0 0 1 11 0.026 0.216 0 0 0 0 0 1 11 /00$%&'748),,. 0.209 0.854 0 0 0 1 1 3 30 0.242 1.089 0 0 0 1 1 4 30 $%&'748)1,,. 0.126 0.332 0 0 0 1 1 1 1 0.136 0.343 0 0 0 1 1 1 1 $%&'748)2,. 0.035 0.185 0 0 0 0 0 1 1 0.041 0.198 0 0 0 0 0 1 1
26
H.!System GMM Procedures Specifications
H.1.! SGMM with !"#$%&'(),+ as Dependent Variable
To implement our SGMM estimation, we first select the appropriate number of lags of the
dependent variable to be included as regressors. Following a process similar to that executed by
Chen et al. (2015), we selected the initial number of lags by choosing a number of lags that is
consistent with our phenomena of interest, and then gradually reducing the number of lags while
testing for serial correlation in the errors and the validity of the overidentifying restrictions. We
chose to start with 3 lags of ,-./01234,5 based on the provider’s belief that it takes buyers
about 3 months to stabilize their behavior. In our first run, using all available instruments (from
the 1st lag of the first differences and from the 2nd lag of the values up to the end of the panel),
the Arellano and Bond (1991) serial correlation test indicated that we do not only have the
expected 1st order but also 2nd order serial correlation. As a result, we cannot use either the 1st lag
of the variables’ first differences or the 2nd lag of the variables’ values as instruments. However,
we can still rely on the variables’ 2nd lag of their first difference as instruments for the levels
equation and their 3rd and posterior lags as instruments for the differences equation (Cameron
and Trivedi 2010). We did not find a valid specification with fewer than 3 lags of ,-./01234,5.
We show the model with all available instruments in column (6) of Error! Reference
source not found.. It passes the Hansen (1982) J test for the validity of our overidentifying
restrictions with 67 797 = 761.77, > = 0.810. We also verified we did not suffer from 3rd or
higher order serial correlation in our errors. Then to avoid the problem of over fitting the model
with too many instruments (Roodman 2009b), we gradually reduced the number of lags used as
instruments until we found the least number of instruments under which we still passed the
instrument validity Hansen J test. We found that we can limit our model to the use of the 3rd lag
27
of ,-./01234,5 and from the 3rd to the 11th lag of AB,,CDEDBF4,5. This model’s results are
reported in column (7) of Table 4. We once again pass all specification tests. Moreover we also
reduced the total number of instruments from 864 to just 296.
Finally, we augment our instrument matrix for these same model specifications with the
exogenous failure-based instruments used in column (4) of Part C of Table 3. The results with
this augmented instrument matrix are shown in columns (8) and (9) of Error! Reference source
not found., with all and the least number of instruments respectively.
H.2.! SGMM with G'HI+)&"JH'H!!$!),+ as Dependent Variable
Using 3 lags of the dependent variable as before and all available instruments, the Arellano and
Bond (1991) test again found 2nd order serial correlation, so we must again rely on the 2nd lag of
the variables’ first difference 3rd or later lag of the their values as instruments. After adjusting for
this, we found that the number of lags of the dependent variable included as regressors can be
reduced from 3 to 2. We report the System GMM estimation with all available instruments in
column (6) of Table 7. Then, we reduce the number of lags of the variables used as instruments
until we find the valid model that uses the least number of lags of A2EKDL1-ME2E,,/,4,5 and
AB,,CDEDBF4,5. The specification used in column (7) of Table 7 uses the 2nd lag of the variables’
first difference as instruments for the levels equation, and also uses the 3rd through 12th lags of
A2EKDL1-ME2E,,/,4,5 and the 3rd through 8th lags of AB,,CDEDBF4,5 as instruments for the first
differences equation.
28
I.! Cloud Scaling Methods and Construction of G'HI+)&"JH'H!!$!),+ Variable
This appendix elaborates on the pros and cons of having a horizontally scalable architecture and
then explains the technical process followed to construct the A2EKDL1-ME2E,,/,4,5 variable.
I.1.! Scaling Vertically vs. Horizontally
In the context of IT infrastructures, there are essentially two ways of scaling the total allocated
compute capacity: s IT infrastructure: vertically and horizontally (Garcia et al. 2008; Michael et
al. 2007; Reese 2009, p. 176). These are also called up and out, respectively.
Scaling vertically implies increasing the capacity of a server or spreading out the IT stack
across several servers, in either case having at most one server per function. While this approach
is easy to implement, growth in vertical scaling is capped by the maximum server capacity
available.
In contrast, under horizontal scaling several servers perform functions in parallel and this
scaling method offers virtually unlimited growth potential. Buyers may prefer to scale
horizontally for other reasons. Given the relatively high likelihood of a commodity cloud server
failing, an IT infrastructure architecture designed for cloud environments will optimally have its
workloads distributed across several nodes, rather than all concentrated in a single node (Reese
2009). However, despite its advantages, horizontal scaling also presents challenges associated
with load balancing and session management across servers, among others (Casalicchio and
Colajanni 2000; Cherkasova 2000, and interviews with cloud experts at IBM Thomas J. Watson
Research Center, Yorktown Heights, New York, and a major technological research univeristy).
Therefore, having servers that work in parallel increases the complexity of the architecture and,
at the same time, this signals a better and more efficient use of advanced cloud features.
29
I.2.! The G'HI+)&"JH'H!!$!),+ Variable
Our analysis of the complexity of buyer infrastructure deployments is based on an automated
analysis of the names given by buyers to their servers. We develop an algorithm that compares
the names of the servers being run by each buyer at the end of every day during our sample and
check if we find servers with names very similar to each other. Specifically, we consider two
server names to be similar to each other if they have a Levenshtein (1966) distance that is less or
equal to two, meaning that one server’s name can be made equal to the other by editing
(inserting, deleting or substituting) 2 characters (letter or numbers) or less. Our assumption is
that if we find two or more servers with very similar names, they will very likely be performing
the same function in parallel (e.g., web1.domain.com and web2.domain.com). If we find different
sets of servers with similar names, we count them all together as functioning in parallel (e.g.,
web1, web2, and web3, and database1 and database2, are five servers working in parallel). At the
end of each day in buyer L’s tenure, we count the number of servers with similar names and
divide the count by the total number of servers being run, and then average the metric over
month D. The resulting average fraction of servers running in parallel is captured in our new
dependent variable, A2EKDL1-ME2E,,/,4,5 (see Table 1 for descriptive statistics).
30
J.! Impulse Response Functions
An impulse response function represents the response of a dependent variable to a (one-time)
unit change in some covariate while all other variables dated D or earlier are held constant
(Hamilton 1994, pp. 318-323). In our case, we compute and plot NOPQRSTUVW,XNYZT[5\]OOW,X^_
over time to
show how current memory usage is influenced by adoption of full support ` periods ago.
These derivatives are the coefficients of the associated rational lag model (Greene 2008,
pp. 683-686). The rational lags identify that effect that each lag of the covariates, on its own, has
on the dependent variable. In the case of the coefficients of ab1>DAB,,4,5cd in Model (2), the
early lags (i.e., ` ≤ 2) have a specific formulation, while the later lags (i.e., ` ≥ 3) follow a
recursive form. The approach is very similar to that of example 20.4 in Greene (2008, pp. 685-
686). The rational lag coefficients, which we denote id, are computed as follows:
ij =k,-./01235kab1>DAB,,5
llll= mj
in =k,-./01235kab1>DAB,,5cn
= mn + pnij
i7 =k,-./01235kab1>DAB,,5c7
= m7 + pnin ll+ p7ij
id =k,-./01235kab1>DAB,,5cd
= md + pnidcn + p7idc7 + pqidcq,lll3 ≤ ` ≤ 2.l
lSince the model for A2EKDL1-ME2E,,/,4,5 only has 2 (rather than 3) lags of the dependent
variable, the recursive form starts from i7.
As an additional step, we use Monte Carlo simulation and draw 100,000 random samples
of the vectors of the md and pr coefficients using their estimates and their variance-covariance
matrix. For each draw we compute and record the rational lags id, and use their distributions to
estimate their 90% confidence interval. In Figure 1, the dashed lines represent the 5th and 95th
percentiles of the distribution of each id.