internet vs tv 2
TRANSCRIPT
-
7/28/2019 Internet vs TV 2
1/33
Internet vs. TV Advertising: A Brand-Building
Comparison
Michaela Draganska
The Wharton School
Wesley R. Hartmann
Stanford GSB
Gena Stanglein
Google
Abstract
A key issue for media planners determining the share of their advertising budgetsto spend on Internet advertising is whether Internet advertising can build brands aseffectively as television advertising. To address this question, we extend traditionalbrand-message recall measurement to facilitate comparisons between Internet formatsand television. Specifically, we supplement brand-message surveys conducted during thecampaign with a set of pre-campaign surveys to control for pre-existing brand knowl-
edge, and use a matching procedure to ensure the pre-campaign sample is comparableto the in-flight one.
For our analysis, we use a rich data set comprising 20 campaigns, across multipleindustries ranging from consumer packaged goods to telecommunications. We findsubstantial cross-brand variation in pre-existing knowledge as well as variation acrossadvertising formats. In particular, individuals exposed to Internet display ads havesignificantly lower levels of pre-existing brand knowledge than television viewers. Suchdifferences in initial conditions suggest biases in comparisons between Internet andtelevision ads, and possibly a more general failure of the brands to establish lastingassociations among individuals shifting media consumption from TV to the Internet.Incorporating these pre-existing differences between media formats results in brandlift measures for Internet ads that are statistically indistinguishable from comparabletelevision lift measures.
Keywords: advertising, display, television, Internet.
The authors would like to thank Oscar Mitnik and Amogh Vasekar for valuable assistance, as well asRawley Cooper, Brent Davis and Scott McKinley at Nielsen for their help in executing the study. Draganskaand Hartmann served as consultants during the survey design and administration phases of the project.
Email: [email protected]: [email protected]: [email protected]
1
-
7/28/2019 Internet vs TV 2
2/33
1 Introduction
Over the past decade advertising expenditures have shifted from traditional media to the
Internet. In 2011, online advertising in the United States alone reached $32 billion and is
projected to reach $62 billion by 2016 (eMarketer, February 2012 report). Internet portals
determined to use their inventory to substitute for traditional advertising formats have turned
to quantitative metrics to illustrate the advantages of online advertising. They are armed
with the ability to readily observe behavioral responses on the web, such as click-through
rates, and to conduct large-scale online experiments to provide the most accurate measure of
the effectiveness of the ads in driving consumer purchasing decisions (Lewis & Reilley 2011,
Goldfarb & Tucker 2011).
Nevertheless, many advertisers still hesitate to shift spending from television campaigns
to the Internet, pointing to the established role of TV advertising in building brands. The
solid experimental evidence quantifying the behavioral response to Internet advertising does
not seem to be a sufficient reason, because no direct comparison to the effectiveness of TV
as a brand-building medium is available. In general, TV experiments are costly and thus not
scalable for wide-spread application to allow for a comparative study of the effectiveness of
online and offline campaigns. Older experimental studies on TV advertising, most notably
by Lodish, Abraham, Kalmenson, Livelsberger, Lubetkin, Richardson & Steve (1995a) and
Lodish, Abraham, Kalmenson, Livelsberger, Lubetkin, Richardson & Steve (1995b) do not
have Internet data. Considering also the typical sample sizes for such experiments, obtaining
a significant effect of advertising on sales frequently fails due to lack of power (Lewis &Reilley 2011), so we cannot really rely on them.
For that reason and because of the perception that TV advertising is the main medium
for brand building, the metrics typically used to assess its effects are brand awareness and
preference. The rationale is that, although direct links to eventual purchase are sometimes
possible, brand advertising on television is primarily aimed at influencing the mindset of a
customer who may purchase anytime within a reasonably long horizon (Assmus, Farley &
2
-
7/28/2019 Internet vs TV 2
3/33
Lehmann 1984). By contrast, the effect of online advertisements has been measured mostly
on outcomes such as click-through rates and generated sales.
A few recent studies have questioned the emphasis on sales measures and have pointed to
the brand-building potential of the Internet (Briggs & Hollis 1997, Dreze & Hussherr 2003).
In one of the earliest studies of online advertising, Briggs & Hollis (1997) show that banner
ads can also have an effect on brand awareness and image, even in the absence of a behavioral
response such as a click-through. Using eye-tracking devices in conjunction with a large-scale
survey, Dreze & Hussherr (2003) find that consumers avoid looking at banners, but there is
still an effect on brand recall measures, suggesting a pre-attentive level of processing. This
research implies that attitudinal measures may be more appropriate not just for assessing
the effectiveness of TV commercials but that of online advertising as well.
To date, however, no field data have been available to enable media planners to conduct
an apples-to-apples comparison of the advertising effectiveness of online and offline media in
terms of creating brand awareness and establishing brand associations. This paper seeks to
fill this gap and measure Internet advertising performance, specifically the performance ofvarious non-search advertising formats, according to the metrics advertisers have historically
relied on for their television campaigns.1
We have a unique data set of 20 advertising campaigns spanning a wide variety of product
categories and industries. In addition to TV commercials, we have data for Internet banner,
rich media and video ads. The advertising campaigns use the online and TV advertising
formats concurrently, and the effect of the commercials is assessed using the same brand-
recall measure (ability of respondent to correctly link creative to brand) for all advertising
formats, thus providing the data for a valid comparison of the effectiveness of the different
media.
1At the start of 2011, Google, The ARF, Nielsen, Stanford, and Wharton collaborated on an initiative toenhance the media planning and buying process. The goal was to quantify cross-media ad-format effective-ness, and derive the relative impact of ad formats. The first phase of this project was a pilot measuring thebrand cut-through (i.e., ability of consumers to correctly link a brand to a creative) of ad formats across adcampaigns.
3
-
7/28/2019 Internet vs TV 2
4/33
The performance of an ad campaign is gauged relative to a baseline and is referred to
as lift. One could posit, as is common in the industry, that absent advertising consumers
would randomly associate brands with commercials. However, especially for mature brands,
assuming consumers do not have any pre-existing brand knowledge due to exposure to past
advertising, word of mouth, or other experiences with the brand is naive. In addition, for us
to compare advertising effects for a given campaign across formats, potential customers who
extensively use the Internet and those who predominantly watch TV need to have the same
level of pre-existing familiarity with the brand. If the existing stock of past advertising differs
by media behavior and is thus dependent on the type of ad format, to which an individual is
likely to be exposed, using a constant baseline across formats would no longer yield a valid
comparison.
To account for such potential disparities in the pre-existing familiarity with the brand
across media, we have modified the traditional television recall methodology to include a pre-
campaign survey to obtain the initial conditions of the advertising stocks for consumers
with different media-consumption habits. We avoid a testing bias by employing a repeatedcross-section design rather than a true panel; that is, we measure pre-campaign brand recall
and in-flight (during the campaign) brand recall for separate sets of consumers.
We ensure the comparability of the pre-campaign and in-flight survey groups by employ-
ing a nearest-neighbor matching procedure (Abadie & Imbens 2012). This technique allows
us to select only those individuals from the pre-campaign sample who exhibit media con-
sumption behavior similar to that of the individuals surveyed during the campaign. Having
this pre-campaign measure for an equivalent group gives us a much more accurate baseline
to establish the lift of a campaign relative to assuming random guessing as is typically done
in the industry.
We find substantial differences in the pre-existing levels of brand knowledge both across
campaigns and across advertising formats. In particular, respondents who were exposed
predominantly to the Internet formats had a lower level of pre-existing brand knowledge
4
-
7/28/2019 Internet vs TV 2
5/33
than TV viewers. Our analysis further reveals that ignoring the initial conditions results
in different conclusions regarding the relative effectiveness of TV versus online formats.
Comparing the impact of the three online formats - banner ads, rich media, and video - to
commercials aired on TV using the traditional measure of in-flight brand recall, we find TV is
superior to the Internet. Upon adjusting for the pre-existing differences in brand knowledge
by format, however, we find that Internet ad performance is statistically indistinguishable
from TV.
In the next section, we provide some background on the traditional brand-recall measures
and define the conditions under which the methodology can be interpreted causally. Section
3 explains the data-collection procedure and provides a description of the variables used in
the analysis. We proceed by outlining our empirical strategy in section 4 and then present
the findings in section 5. Section 6 concludes with directions for future research.
2 Traditional Recall Methodology
A long-established practice in advertising research is to survey individuals who were exposed
to an ad the previous day to determine the extent to which they recall the ad message, the
brand, and can link the message to the corresponding brand (Rossiter & Bellman 2005).
Although these attitudinal measures only approximate the effect on purchase behavior that
advertisers are ultimately interested in, they have gained wide acceptance and usage. In
an early study, Wells (1964) compared the different recognition, recall and rating scales
employed in practice and concluded that recall scores, which reflect the advertisements
ability to register the sponsor name and to deliver a meaningful message to the consumer,
are particularly trustworthy. More recently, Krishnan & Chakravarti (1999) review existing
memory tests for assessing advertising effectiveness and underscore their value across a wide
range of advertising objectives.
The ad message could inform the viewer about the existence or functional attributes of
the brand, or establish non-functional brand associations. It produces memory traces about
5
-
7/28/2019 Internet vs TV 2
6/33
brand-specific and message-specific information, about the product category, evaluative re-
actions, and brand identification (Hutchinson & Moore 1984). For the message to have an
effect, the consumer needs to know which brand is being advertised. Empirical studies have
shown that this is a nontrivial task, as only about 40% of consumers who have viewed a
commercial recall the sponsor of the message (Franzen 1994). Establishing brand-message
links therefore is a critical input to brand building. In the present research, we focus specif-
ically on the question of whether a respondent prompted with a description of the ad can
recall the brand.
Message-recall studies have a few reasons for selecting individuals who viewed the ad the
day before. First, the goal is to assess the creatives ability to link the brand and message,
and not necessarily to assess the quality of the message itself. For example, one could
imagine assessing recall several days after the individuals view the ad to see whether the
ad sticks. This measure, however, says more about the memorability of the message than
the creatives ability to help the brand cut through and get viewers attention. Second,
exposure is traditionally inferred based on self-reported viewing of a TV program duringwhich a commercial was aired (opportunity to see). That is, respondents have been required
to recall their program viewership to establish exposure. Doing so for a longer period of
time can result in too much error. Passive measurement of exposure, for example, through
a meter installed on the TV set or through other tracking devices, can resolve this problem,
but such measurement is not available at a large enough scale for all advertising.
To describe the traditional methodology, we begin by introducing some notation. Let
Ys be an indicator for whether respondent s can correctly select the brand from a multiple-
choice list after being prompted with a description of the advertising message. Xs is an
indicator for whether respondent s was exposed to the ad. Finally, let ys0 be a probabilistic
assessment of how well respondent s could have guessed the associated brand before the ad
was run. Then the traditional recall methodology defines = E[Y y0|X = 1] to be the
expected lift among the exposed population in linking the brand with the message. The
6
-
7/28/2019 Internet vs TV 2
7/33
estimator for is
=
{s|Xs=1}
(Ys ys0) ws, (1)
where the summation conditions on respondents who have been exposed to the ad and ws
weighs respondents based on how representative they are of the entire population exposed
to the ad.2 If respondents are randomly drawn from the exposed population, there is no
need for the weight ws. This term arises here because of selection issues market research
organizations have in recruiting their panels.
The baseline ys0 captures consumer past interactions with the brand and provides a
measure of the extent to which its advertising has established an association between brand
and message. One might expect successful brands to already have a reasonably high baseline
association with the message because the message is probably related to associations they
have previously communicated. On the other hand, a new brand may have no pre-existing
associations that could be tied to the message, resulting in a small baseline.
In practice, ys0 is typically a predetermined constant, such as the success rate of guessing
at random, that is the same for all respondents. Obtaining a more accurate measure of the
baseline ys0 by establishing an initial condition for the campaign is important both for the lift
measurement above as well as for providing the advertiser with information about how well
past campaigns have imprinted a brand image. Furthermore, if is to be compared across ad
formats, recognizing that individuals exposed to different formats may systematically differ
in their level of pre-campaign associations is critical.
The traditional recall methodology can be characterized as trying to measure a treatmenteffect on the treated population (Heckman, Ichimura & Todd 1997, Imbens 2004). This
2The canonical message-recall measure focuses on assessing a single airing of an ad. Yet practitionersoften group together multiple ads in a day as well as ads aired across multiple weeks of the campaign.Nevertheless, the estimator in equation (1) is still applied, but the meaning may change because responsesfor ads later in the campaign could involve more campaign exposures than those responses for ads earlierin the campaign. Practitioners have attempted to account for multiple campaign exposures by consideringthe build and/or decay in the brand associations throughout the campaign. With enough surveys, onecould repeat the above analysis at each point in time, but more often the researcher tries to estimate howthe responses vary with how far along the campaign is in terms of either time or total exposures.
7
-
7/28/2019 Internet vs TV 2
8/33
terminology arises from the focus on only measuring the effect for those individuals who
were exposed, that is, conditioning on X = 1 in = E[Y y0|X = 1].
The primary challenge to a causal interpretation of recall studies is the establishment
of a control condition. Because the same individual cannot be simultaneously exposed and
unexposed, measuring ys0 for a respondent who is exposed is typically impossible. To clarify
the problem, we separate the lift measure, = E[Y y0|X = 1], into two independent
expectations: = E[Y|X = 1] E[y0|X = 1]. The first component of this expression,
the probability of correctly identifying the brand if an individual was exposed to the ad
campaign, E[Y|X = 1], can be easily obtained from observed recall and exposure data. The
latter component, E[y0|X = 1], requires assessing the control outcomes, that is assessing
whether a respondent would have correctly linked the creative to the brand without seeing
the ad. This measurement requires experimentation and presents a particular challenge
for media such as television. In section 4 we propose a method to obtain this measure by
augmenting the traditional methodology described above with a pre-campaign survey. Before
we proceed, we first introduce the data set in the next section.
3 Data
3.1 Data Collection and Variables
The data-collection effort employs Nielsens TV Brand Effect panel. This panel consists of
a large number of participants who reveal their advertising exposures across Internet and
television formats and answer creative and brand-recall survey questions on rewardtv.com.
The panel consists of more than six million registered members, with a weekly average
of 26,000 participants. On average, a panelist would visit the rewardtv.com site 1.5 times a
week and take 1.7 surveys per visit. Approximately 83% of panelists are new each month.
Because we rarely observe the same individual for longer stretches of time, the panel is best
considered a repeated cross section, which limits our ability to make before-after comparisons
of the same individual. However, repeatedly asking a respondent about the same ad and set
8
-
7/28/2019 Internet vs TV 2
9/33
of brands may lead to conditioning effects (testing bias), so not having a long time series is
not necessarily a negative feature in this setting.
Nielsen recruits panelists across various Internet portals and sites and through word of
mouth. To maximize daily participation, the site provides a lot of entertainment content,
along with sweepstakes, auctions, and discounts. The incentives are soft, though, thus
ensuring a high turnover and minimizing the potential for conditioning effects. Nielsen
conducts periodic checks to ensure the panelists exhibit the same TV viewing and Internet
usage behavior as other Nielsen panelists, and uses weights to ensure the representativeness
of each surveyed individual. The first survey for new panelists is eliminated to allow for a
training/experimentation period, and any abnormal participation and response patterns are
carefully examined.
In addition to TV commercials, we investigate three online formats - banner ads, rich
media, and video. Banner ads are ads with or without animation with which the user
cannot interact. Examples include overlays on video content, companion banners, wallpapers,
and skyscraper. Video is any streaming video, pre-roll, post-roll, or in-roll. Rich media areany ads with which the mouse can interact without necessarily activating a click-through,
such as expandable ads, interactive game ads, and corner peels.
To record online ad exposure, online ad creatives are tagged and then linked to the
panelists via cookies on their computers. Provided cookies are not erased and the user does
not change computers, Internet exposures are complete for the duration of the campaign
irrespective of an individual logging in on rewardtv.com. Television exposure is inferred
when a respondent logs in to rewardtv.com and states that on the preceding day, she watched
a program that is known to have run an advertisement from the campaign (opportunity to
see). For TV exposures, we thus do not observe exposures an individual may have had prior
to logging onto rewardtv.com.
When an individual logging in is identified as having been exposed to an ad, she is
presented with a description of a scene from a commercial (for an example, see Table 1).
9
-
7/28/2019 Internet vs TV 2
10/33
This description often comes in the form of a question assessing whether the respondent can
recall the creative. Next, the respondent is asked to indicate which of four listed brands the
commercial was for.
Table 1: Example for creative-recall and brand-recall questions
In a commercial during this show, who spoke directly to the cameraand said, I just bought stock you just saw me buy stock, as he satat a computer keyboard?
Well-spoken baby who eventually spat up all over the place Monkey wearing a custom-tailored suit and a fine silk tie
Simple peasant from the past who came from a rural village
Alien from outer space who did not speak earth language
What was this a commercial for?
E Trade
TD Ameritrade
Scottrade
Charles Schwab
Questions are asked in the same way for all formats. Brand recall, however, is only
measured conditional on creative recall in the case of TV, as opposed to rich media, video,
and banner ads, where all responses are recorded. To keep the data comparable, we retain
individuals who answered the creative-recall question correctly for all formats. The sample
sizes by format and campaign are reported in Table 2.
We collected data for 20 advertising campaigns run in 2011 across several industries: tele-
com, food and beverage, beauty, financial services, and pharmaceuticals. For confidentiality
reasons, we cannot share the brand names that were advertised, but Table 3 gives some
information about each campaign and the brand advertised. We see that the campaigns
vary considerably in terms of duration, with the shortest campaign being four weeks and the
10
-
7/28/2019 Internet vs TV 2
11/33
Table 2: Sample sizes for in-flight sample (first number) and unmatched pre-campaign sample(second number) by survey question format and campaign.
banner rich media video TVcampaign 1 307/1723 93/3419 92/1695 1893/1729campaign 2 225/2313 36/2339 141/2312 3909/2327campaign 3 334/919campaign 4 157/3562 721/3546campaign 5 78/1199 2338/1199campaign 6 146/826 239/804 366/807campaign 7 468/1023 90/1031 83/1068 2518/3348campaign 8 2269/5966campaign 9 78/1225 959/1255
campaign 10 189/1400 245/1320 84/1409 1955/2935campaign 11 258/2135 467/2123 131/2123 3875/3396campaign 12 820/1980 407/1083campaign 13 75/957 352/964campaign 14 53/1277 380/1254campaign 15 87/971campaign 16 73/1271 2426/1602campaign 17 1108/438campaign 18 1386/1570 57/1425 3658/1453campaign 19 2118/2241campaign 20 53/1723 36/3419 648/1729
11
-
7/28/2019 Internet vs TV 2
12/33
longest, 36 weeks.
Table 3: Duration of advertising campaigns, penetration of advertised brand in its respectiveproduct category, and share of TV GRPs for the four quarters prior to current campaign.
TV weeks online weeks penetration TV GRP sharecampaign 1 12 15 0.33 0.48campaign 2 8 25 0.33 0.47campaign 3 8 8 0.08 0.36campaign 4 8 8 0.15 0.00campaign 5 10 10 new 0.16campaign 6 19 19 0.02 0.32campaign 7 32 32 0.12 0.22
campaign 8 36 36 0.19 0.55campaign 9 12 12 0.21 1.00campaign 10 27 27 0.17 0.45campaign 11 12 30 new 0.00campaign 12 12 30 0.03 0.68campaign 13 4 4 0.19 0.15campaign 14 6 6 0.35 0.30campaign 15 4 4 0.01 0.43campaign 16 14 14 0.23 0.44campaign 17 8 8 0.12 0.16
campaign 18 14 14 0.12 0.08campaign 19 7 7 0.36 0.31campaign 20 11 11 0.36 0.27
The percentage of US households who are buying a certain CPG brand or using a service
(penetration of the brand) varies widely across campaigns: we have a new brand (campaign
11), a new line extension (campaign 5), along with several category leaders with a high
penetration of more than 30% (campaigns 1, 2, 14, 19 and 20). The level of advertising
in the four quarters prior to the current campaign also exhibits substantive variation: from
non-existent (campaigns 4 and 11) to 100% of the TV GRPs in the category for campaign
9.
3.2 Recall Measures: In-Flight Sample
The brand-recall analysis as described in section 2 consists only of respondents correct or
incorrect associations of the brand with the message. To collect these data, Nielsen deploys
12
-
7/28/2019 Internet vs TV 2
13/33
surveys while an ad campaign is running. When individuals report they have viewed a TV
program that aired a commercial for the focal campaign or when they have visited a web page
featuring an online ad, they are presented with the brand-recall question. Table 4 displays
the average of the responses from the in-flight survey by campaign and format. Because
these estimates do not include an adjustment for a baseline response, they are calculated as
in equation (1), except that ys0 is set to zero:
{s|Xs=1}
Ysws.
Table 4: Percentage of correct linkages of brand and creative across formats and campaigns for
all individuals surveyed in-flight. Standard deviations are reported in parentheses.
banner rich media video TVcampaign 1 0.40 (0.49) 0.30 (0.46) 0.39 (0.49) 0.40 (0.49)campaign 2 0.39 (0.49) 0.24 (0.43) 0.41 (0.49) 0.37 (0.48)campaign 3 0.53 (0.50)campaign 4 0.37 (0.48) 0.35 (0.48)campaign 5 0.15 (0.35) 0.31 (0.46)campaign 6 0.44 (0.50) 0.43 (0.50) 0.41 (0.49)campaign 7 0.35 (0.48) 0.34 (0.48) 0.44 (0.50) 0.42 (0.49)campaign 8 0.79 (0.41)
campaign 9 0.85 (0.36) 0.78 (0.42)campaign 10 0.51 (0.50) 0.50 (0.50) 0.80 (0.40) 0.68 (0.47)campaign 11 0.36 (0.48) 0.36 (0.48) 0.59 (0.49) 0.49 (0.50)campaign 12 0.58 (0.49) 0.48 (0.50)campaign 13 0.38 (0.49) 0.18 (0.39)campaign 14 0.48 (0.50) 0.84 (0.37)campaign 15 0.34 (0.47)campaign 16 0.60 (0.49) 0.48 (0.50)campaign 17 0.55 (0.50)campaign 18 0.46 (0.50) 0.55 (0.50) 0.55 (0.50)campaign 19 0.53 (0.50)
campaign 20 0.39 (0.49) 0.49 (0.51) 0.38 (0.49)
Looking at the average brand recall rates in Table 4, we see many substantial brand-
message links. There is also substantial variation both across formats and campaigns. Al-
though the numbers in the table cannot be directly interpreted as a lift measure because
the baseline has not been removed, we can subtract the one traditionally used in practice,
ys0 = 0.25, from the reported numbers to get an estimate of the lift. It is notable that
13
-
7/28/2019 Internet vs TV 2
14/33
although many campaigns have a positive lift, quite a few format-campaign combinations
(e.g., banners in campaign 5, rich media in campaign 2, and TV in campaign 13) are below
the baseline of 0.25. These numbers could be indicative of a poor campaign that broke pre-
viously established brand-message links, or as we will explore with our initial conditions
methodology cases, in which the baseline should actually be lower.
To formally assess the differences between recall rates for Internet formats and television,
we aggregate across campaigns. Table 5 reports the results of comparing the average recall
rates for campaigns that used Internet formats to the recall rates for TV for these campaigns.
For campaigns that ran some banner ads, the average brand-message recall of banners is 0.45,
whereas it is 0.50 for TV ads, with the difference having a p-value of 0.01. Similarly, among
the campaigns running rich media, the recall is 0.37 for rich media, but significantly greater
at 0.46 for TV. The video ads recall is significantly greater than TV (0.50 versus 0.44) in
those campaigns airing some video ads. Based on these data, we might therefore conclude
that TV outperforms banner ads and rich media in terms of brand recall, whereas video
outperforms TV.
Table 5: Comparison of average recall rates for Internet formats vs. TV across campaigns inin-flight sample. Campaigns that do not use a given online format were excluded.
avg. recall t-stat p-valuebanner 0.45 -3.24 0.01TV 0.50rich media 0.37 -3.92 0.00TV 0.46video 0.50 2.88 0.00TV 0.44
3.3 Recall Measures: Pre-Campaign Sample
For this research project, we augmented the in-flight data collection with a set of surveys,
which were deployed before the advertising campaign was run, to account for pre-existing
differences in respondents abilities to link the brand and message. As we describe in sec-
14
-
7/28/2019 Internet vs TV 2
15/33
tion 4, these pre-campaign surveys can be used to measure more accurately the lift the ad
campaign provides relative to an initial condition than by simply assuming that, absent
advertising, consumers would randomly guess.
Table 6: Percentage of correct linkages of brand and creative across formats and campaigns inpre-campaign survey sample. Standard deviations are reported in parentheses.
banner rich media video TVcampaign 1 0.38 (0.48) 0.39 (0.49) 0.39 (0.49) 0.37 (0.48)campaign 2 0.35 (0.48) 0.35 (0.48) 0.38 (0.48) 0.36 (0.48)campaign 3 0.49 (0.50)
campaign 4 0.27 (0.44) 0.29 (0.45)campaign 5 0.18 (0.39) 0.17 (0.37)campaign 6 0.26 (0.44) 0.25 (0.44) 0.27 (0.45)campaign 7 0.48 (0.50) 0.43 (0.49) 0.47 (0.50) 0.53 (0.50)campaign 8 0.54 (0.50)campaign 9 0.41 (0.49) 0.50 (0.50)campaign 10 0.45 (0.50) 0.41 (0.49) 0.47 (0.50) 0.47 (0.50)campaign 11 0.10 (0.30) 0.09 (0.29) 0.10 (0.31) 0.08 (0.28)campaign 12 0.43 (0.50) 0.40 (0.49)campaign 13 0.16 (0.37) 0.17 (0.38)campaign 14 0.31 (0.46) 0.34 (0.47)
campaign 15 0.23 (0.42)campaign 16 0.30 (0.46) 0.34 (0.47)campaign 17 0.33 (0.47)campaign 18 0.17 (0.38) 0.18 (0.39) 0.19 (0.39)campaign 19 0.27 (0.44)campaign 20 0.24 (0.43) 0.23 (0.42) 0.26 (0.44)
Preliminary examination of the average pre-campaign brand-recall rates in Table 6 re-
veals that the recall rates vary substantially across campaigns and that large deviations from
a random guess rate of ys0 = 0.25 are present. As expected, the correct linkages for the new
products (campaigns 5 and 11) are quite low. In line with our intuition, the preexisting
brand knowledge for campaign 5, which is a line extension, is somewhat higher than the
entirely new brand in campaign 11. Campaign 18, which has a low share of TV GRPs (8%),
is also characterized by a low level of creative-brand association. By contrast, campaigns
with a relatively high penetration and share of TV GRPs have higher creative-brand associ-
15
-
7/28/2019 Internet vs TV 2
16/33
ations. We do not have enough data to fully document a relationship between the campaign
characteristics and the probability of correctly linking a creative to a brand, but sufficient
evidence exists to suggest that subsequent analyses should account for, and possibly attempt
to explain, the presence of systematic variation.
Table 7: Comparison of average recall rates for Internet formats vs. TV across campaigns inpre-campaign survey sample. Campaigns that do not use a given online format were excluded.
avg. recall t-stat p-valuebanner 0.31 -3.82 0.01
TV 0.33rich media 0.32 -4.78 0.00TV 0.35video 0.30 -0.93 0.39TV 0.31
One notable difference in the pre-campaign recall rates reported in Table 6, relative to
the in-flight recall rates in Table 4, is that much less variation is present across formats.
This lack of variation is to be expected because the differences across formats in Table 6
are only in the question asked, not in the respondents past or future exposure to a given
format (the questions were asked before the campaign had begun, so the respondents could
not have been exposed to the ad.
Table 7 reports a direct comparison of average recall for each Internet format to that for
TV. Both banners and rich media perform slightly worse relative to TV (a difference of -0.02
for banners and -0.03 for rich media), whereas video is statistically indistinguishable from
TV.
3.4 Comparison of In-Flight and Pre-Campaign Samples
For the summary statistics of the pre-campaign sample to be considered a valid baseline
to calculate the lift of a campaign, we need to ensure the respondents included in the pre-
campaign sample are comparable to the ones surveyed during the campaign. This may not
16
-
7/28/2019 Internet vs TV 2
17/33
be the case, however, for a number of reasons. First, we are only interested in the effect of the
campaign on the exposed individuals, and therefore respondents who are not exposed should
receive a weight of zero in our analysis. Survey respondents in the pre-campaign sample
by definition have not been exposed to the ad at the time they are surveyed, but their
subsequent exposures (if any) have been recorded. We can therefore examine their various
media exposures to verify they saw the commercial in the focal campaign and format. Table
8 reports the percentage of the pre-campaign sample eventually exposed to an ad in the focal
format and campaign. Although this percentage is quite high for TV, many pre-campaign
respondents in the Internet formats were never exposed to the campaign. By contrast, all
in-flight respondents have by definition been exposed. To make the samples comparable, we
thus need to focus only on individuals who were eventually exposed (Xs = 1).
Table 8: Percentage of pre-campaign sample who are eventually exposed to focal format (i.e.,were asked about the respective format).
banner rich media video TVcampaign 1 0.70 0.47 0.54 0.83campaign 2 0.63 0.41 0.49 0.88campaign 3 0.96campaign 4 0.59 0.94campaign 5 0.53 0.94campaign 6 0.69 0.46 0.87campaign 7 0.70 0.48 0.40 0.85campaign 8 0.98campaign 9 0.63 0.94campaign 10 0.53 0.64 0.41 0.89campaign 11 0.50 0.73 0.46 0.82
campaign 12 0.76 0.86campaign 13 0.60 0.91campaign 14 0.78 0.91campaign 15 0.98campaign 16 0.50 0.98campaign 17 1.00campaign 18 0.84 0.24 0.84campaign 19 0.99campaign 20 0.67 0.64 0.94
17
-
7/28/2019 Internet vs TV 2
18/33
A second issue is the extent to which the exposed pre-campaign sample and the in-flight
sample are similar in terms of exposures to the different advertising formats. As can be seen
by looking at the averages for both groups reported in Table 9, even those respondents who
were eventually exposed to an ad in the focal campaign have a different rate of exposure
than the respondents included in the in-flight sample. In general, those in the pre-campaign
group have a much higher exposure to TV relative to the in-flight group.
Thinking about what may explain these differences in media exposures, the different
sampling time frames emerge as a possible cause. Whereas the in-flight surveys were collected
for the entire duration of the campaign (anywhere between 4 and 36 weeks), the pre-campaign
measures were typically collected within a week. To obtain the necessary sample size to
ensure we would have an adequate group of individuals who are eventually exposed to the
focal campaign, the selection of respondents had to be much more aggressive, thus yielding a
potentially different sample. For example, the high TV exposures among the pre-campaign
sample could be attributed to a greater number of professional survey takers that might
have overstated TV exposure rates in order to earn more points on rewardtv.com. Using
a matching methodology, we remove these outliers and create a sample comparable to the
in-flight group.
18
-
7/28/2019 Internet vs TV 2
19/33
Table
9:
Averagenumberof
exposurestodifferentadformatsbycampaign.Comparisonof
pre-campaign(leftcolumn)and
in-flight
(rightcolumn)samples.
banner
ric
hme
dia
vid
eo
TV
pre-camp
.
in-fl
ight
pre-c
amp.
in-fl
ight
pre-c
amp
.
in-flight
pre-c
amp
.
in-fl
ight
campa
ign
1
5.6
1
4.7
4
3.2
2
2.9
4
2.4
5
1.9
8
10
.75
2.8
7
campa
ign
2
3.7
9
3.0
8
2.0
8
1.6
8
2.8
7
2.0
4
16
.67
3.9
4
campa
ign
3
21
.96
13
.7
campa
ign
4
4.8
9
6.2
7
7.3
4
1.4
8
campa
ign
5
3.1
9
3.4
1
13
.91
6.2
3
campa
ign
6
5
.7
3.7
2
3.4
4
3.8
2
4.2
9
2.3
1
campa
ign
7
6.5
8
5.6
3.4
2.8
2
6.8
1
4.0
3
11
.51
6.4
1
campa
ign
8
1
2.5
7
1.5
6
11
.49
campa
ign
9
5.7
9
3.6
8
1.8
5
1.6
7
11
.45
2.6
5
campa
ign
10
7.2
6
9.4
7
5.4
8
6.4
6
3.6
5
3.4
8
17
.16
2.5
1
campa
ign
11
3.2
8
8.7
8
4.3
9
4.4
9
3.5
7
4.3
8
10
.65
2.5
campa
ign
12
4.1
8
5.5
5
4.2
3
1.4
6
campa
ign
13
4.2
5
4
.5
4.3
3
2.5
6
campa
ign
14
2.3
1
3.0
2
2.9
5
2.7
campa
ign
15
1.8
5
1.2
3
campa
ign
16
2.3
3
1.8
4
11
.54
8.5
5
campa
ign
17
5
2.2
campa
ign
18
9.9
1
7.7
6
4.8
3.2
3
22
.89
15
.81
campa
ign
19
2.1
7
12
.02
campa
ign
20
6.1
2
5.2
2
8.6
3
6.7
7
11
.31
9.4
2
19
-
7/28/2019 Internet vs TV 2
20/33
4 Matching Methodology
Given our pre-campaign survey data, we conceptualize lift as
= E[Ys1 Ys0|X = 1] ,
where Ys1 indicates correct association of the message and brand during the campaign by
respondent s and Ys0 indicates correct association before the campaign.3 Numbering the sur-
veys before the campaign as {1,...,S0} and those during the campaign as {S0 + 1,...,S1 + S0},
we would ideally measure
=
{s|s>S0,Xs=1}
Ys1ws1
{s|sS0,Xs=1}
Ys0ws0, (2)
where the weights w1s and w0s ensure the surveyed in-flight and pre-campaign individuals
are representative of the population of exposed individuals. We cannot, however, estimate
the above equation because we do not observe ws0; that is, the weights are only calculated
for the individuals surveyed during the campaign. Furthermore, the analysis and discussion
in section 3 indicate the pre-campaign group is systematically different from the in-flight
group for which we observe the weights. To prune the non-representative pre-campaign
respondents, we employ a matching procedure that restricts the analysis to each in-flight
survey and its nearest-neighbor from the pre-campaign group.
4.1 The Matching Estimator
We match each in-flight respondents
surveyed during the campaign with a setM
s of pre-survey respondents based on a set of variables Zs that we describe below. Then we estimate
the following:
=
{s|s>S0,Xs=1}
mMs
(Y1s Y0m)w1s
|Ms|. (3)
3Our notation for s equates surveys and respondents. Given the sampling approach described in section3, a given respondent could potentially fill out multiple surveys in the repeated cross section. We currentlycannot separate such cases to treat them specially.
20
-
7/28/2019 Internet vs TV 2
21/33
In the above expression, Ms is the set of pre-campaign respondents that are matched to
in-flight respondent s, and Y0m indicates whether the mth matched pre-survey respondent
correctly recalled the brand. We divide by the number of matched respondents, |Ms|, such
that the total weight for each in-flight respondent s is equal to that respondents reported
weight, w1s.
The assumption underlying this estimator is
E[Y0|s > S0, Z] = E[Y0|s S0, Z] . (4)
In words, we assume that conditional on the matching variables, Zs, the expected response
to the pre-campaign survey is invariant to whether the individual was surveyed before or
after the campaign began. The assumption therefore guarantees our estimator removes any
systematic sampling differences between the pre-campaign and in-flight groups.
4.2 Matching Variables
Given our goal is to compare advertising effectiveness across various media formats, we
decided to focus on media consumption as the most relevant descriptor of the surveyed
individuals. The matching variables Zs include the total number of campaign exposures for
each of the three Internet formats, as well as the total number of TV exposures across all
campaigns in our data.
The Internet formats provide valuable match variables because they are passively ob-
served and thus do not suffer from self-reporting issues. Furthermore, they are highly re-
flective of the type of individual. Specifically, exposure to the campaigns advertisements
signifies the individual is in the campaigns target, and the number of exposures provides a
measure of the intensity of viewership of the targeted medium.
We do not match on television exposures within the campaign because they are not
passively observed. A pre-campaign respondent could have been exposed to TV even if we
do not observe TV exposure. However, we include the total television exposures across all
campaigns so that we are matching on a measure of television viewership intensity. The
21
-
7/28/2019 Internet vs TV 2
22/33
number of total television exposures also helps us separate out individuals that might take
many surveys, because reported television exposures give the respondent the opportunity
to take more surveys. Such individuals are down-weighted in Nielsens estimate of each
in-flight respondents weight, but we need to match on this characteristic to ensure similar
down-weighting of pre-campaign respondents who might have reported many exposures.
We use a nearest-neighbor matching approach (Abadie, Drukker, Herr & Imbens 2004,
Abadie & Imbens 2012) in which we find at least one pre-campaign survey to match to each
in-flight survey. As Abadie & Imbens (2012) show, allowing individual observations to be
used as a match more than once lowers the bias of the estimates.
We seek exact matches on the campaigns passively observed Internet exposures and
allow the overall television exposures to sort among ties in terms of shortest distance. If ties
are still present, we include all tied matches, which accounts for |Ms| in equation (3) being
greater than 1 for in-flight survey s. As per equation (3), we include the additional matches
based on their share of the total matches to s. If no exact match exists, we find the nearest
neighbor in terms of the distance between the two vectorsZs for the in-flight survey and
Zs
for the pre-campaign survey.
Our procedure worked well. For the TV format question, we are able to match exactly
96% of the in-flight respondents on the passively observed Internet exposures. Note that we
exclude any pre-campaign respondents that do not match in-flight respondents, because our
in-flight respondent weights sum to form the true distribution of exposed individuals. For
banners the percentage is 84%, followed by video at 75% and rich media at 69%.
4.3 Causal Interpretation
Because pre-campaign surveys are conducted well before most of the in-flight surveys (given
that some campaigns last 4-5 months), time-varying unobservables could make a causal
interpretation difficult. Moreover, although matching ensures pre-campaign and in-flight
respondents are comparable in terms of ad exposure and media consumption over the entire
22
-
7/28/2019 Internet vs TV 2
23/33
time frame of the data, it cannot make up for the time gap between the two surveys. Because
our goal is to compare exposures across different media formats, our primary concern arises
from time- varying unobservables that differ based on the media format to which a respondent
is exposed.
One source of time-varying unobservables we know exists is unobserved television ex-
posures. Due to the inability to passively measure television exposures, we only observe a
subset of the actual exposures to TV ads. However, in trying to assess whether Internet
formats can build brands comparably to television, unobserved television exposures would
likely overstate television effects relative to Internet effects. This overstatement is likely to
occur, because we should expect individuals exposed to television to watch more television
on average than individuals exposed to the Internet, giving television-exposed individuals
relatively more unobserved exposures to the ad campaign.
Other sources of time-varying unobservables include other non-advertising marketing
activity by the firm or its competitors. For example, in-store displays do not include messages
that would increase association of the message with a brand but could increase the salienceof the brand in the mind of the customer and therefore increase the focal brands choice in
random guessing. We have no a-priori reason to believe television- or Internet intensive media
consumers should see a firms non-advertising marketing activity at a systematically higher
or lower rate. Competitors are likely to target their marketing activity at the same targets as
those chosen by the focal brand and these competitive actions could lead to systematically
higher or lower levels of associations as the time since the pre-campaign survey increases. If
competitive advertising creates biases in favor of one format over another, we should expect
these biases to be increasing with time since the pre-survey. We therefore consider our effects
separately for different progressions of our campaigns, measured as the number of previous
exposures respondents have to the campaign.
23
-
7/28/2019 Internet vs TV 2
24/33
5 Findings
We discuss the results from the above matching procedure in the context of two separate yet
related research questions. First, by examining the pre-campaign brand recall of the exposed
population, we can evaluate whether past advertising or brand experiences have led to a
divergence in brand associations between Internet- and television-intensive targets. We find
that banner- and rich media- intensive targets have systematically lower levels of brand recall,
which suggests either past advertising was insufficient or less effective for Internet media.
Second, the pre-campaign brand-recall measures derived after the matching procedure serve
as the baseline in our lift measures. The matched pre-campaign sample allows for a more
accurate measure of the campaign lift, and it ensures that is more comparable across formats
because it takes into account any pre-existing cross-format differences in brand knowledge.
5.1 Existing Brand Knowledge across Formats
As some consumers have shifted their media consumption away from television toward vari-
ous online formats, a concern arises as to whether brand-building activities can be transferred
easily across formats. Before we examine the effectiveness of various ad platforms, we con-
sider the lasting effects of past campaigns. Specifically, we measure pre-campaign brand
knowledge separately by the media format to which a respondent is eventually exposed
(and presumably favors). Although our data do not allow us to infer why, for instance, an
Internet-exposed individual may have had less pre-campaign knowledge of the brand than
a television-exposed individual, two explanations for the difference in baseline brand knowl-
edge are possible: (i) brands may have devoted fewer past exposures to the Internet formats
the individual views, or (ii) past Internet exposures had less persistent effects.
We are able to assess pre-campaign associations by exposure format because we observe
pre-campaign respondents eventual exposures to the campaign. Table 10 reports the initial
conditions based on the matched pre-campaign surveys. These initial conditions differ from
the ones reported in Table 6 in that they reflect the responses for only those individuals who
24
-
7/28/2019 Internet vs TV 2
25/33
are exposed to the format-campaign combination and are matched to an in-flight respondent.
Table 10: Percent of correct brand associations before each campaign in matched pre-campaignsample.
banner rich media video TVcampaign 1 0.31 (0.46) 0.34 (0.48) 0.32 (0.47) 0.30 (0.46)campaign 2 0.26 (0.44) 0.24 (0.44) 0.35 (0.48) 0.32 (0.47)campaign 3 0.32 (0.47)campaign 4 0.36 (0.48) 0.36 (0.48)campaign 5 0.24 (0.43) 0.08 (0.28)campaign 6 0.24 (0.43) 0.23 (0.42) 0.14 (0.35)
campaign 7 0.42 (0.49) 0.38 (0.49) 0.50 (0.50) 0.62 (0.49)campaign 8 0.59 (0.49)campaign 9 0.30 (0.46) 0.63 (0.48)campaign 10 0.53 (0.50) 0.45 (0.50) 0.47 (0.50) 0.66 (0.47)campaign 11 0.07 (0.25) 0.05 (0.21) 0.12 (0.32) 0.07 (0.25)campaign 12 0.42 (0.49) 0.42 (0.49)campaign 13 0.11 (0.32) 0.13 (0.34)campaign 14 0.32 (0.47) 0.66 (0.47)campaign 15 0.14 (0.35)campaign 16 0.22 (0.42) 0.41 (0.49)
campaign 17 0.18 (0.39)campaign 18 0.16 (0.36) 0.43 (0.50) 0.16 (0.37)campaign 19 0.20 (0.40)campaign 20 0.11 (0.32) 0.08 (0.27) 0.16 (0.37)
The primary change in Table 10 relative to Table 6 is that substantial variation in brand
recall now exists across formats within a campaign. For example, campaign 7 has a TV
baseline of 0.62, but the baseline is 0.5 or less for the three Internet formats. Alternatively,
campaign 2 has a high baseline on video and TV at 0.35 and 0.32, respectively, but is close
to 0.25 for banners and rich media.
Although the campaign-by-campaign measures are illustrative, our focus is on the aver-
ages across campaigns and within format, where the aggregated sample sizes allow us more
conclusive inference. Table 11 compares average pre-campaign brand recall for each Inter-
net format to the average pre-campaign brand recall for TV. It also compares the matched
estimates with the unmatched estimates. For campaigns running banner ads, we see that
25
-
7/28/2019 Internet vs TV 2
26/33
Table 11: Difference across formats in the percentage correct brand associations in pre-campaignsample. Comparison between matched and unmatched samples. Asterisk denotes a significantdifference at the 5% level.
unmatched matched exact matchesbanner 0.31 0.28 84%TV 0.33 0.36 97%rich media 0.32 0.26 69%TV 0.35 0.35 96%video 0.30 0.32 75%TV 0.31 0.30 97%
the initial condition for those exposed to banner ads dropped to 0.28 with matching, which
is significantly lower than the 0.36 for TV. Rich-media matched initial conditions are also
significantly lower than TV at 0.26. Video is indistinguishable from TV in both the matched
and unmatched samples. We suspect video and TV may be similar, because many of the
video ads were for online viewership of episodes from television series (e.g., through Hulu).
The banner and rich-media differences from TV are worth considering. The fact that the
target audience for the online ad campaigns has a lower level of existing brand knowledge
than the target audience exposed to TV suggests advertisers efforts to reach this population
have been ineffective thus far. The TV population is more familiar with the brand message
and is thus better able to correctly link the commercial to the corresponding brand. This
finding could be the result of insufficient or ineffective past advertising to Internet-intensive
media viewers.
5.2 Comparison of Advertising Lift across Formats
The metric we use to compare the performance of the different advertising formats is the
campaign lift, calculated as the difference in the brand-recall measure between the matched
pre-campaign sample and the in-flight sample (see equation (3)). Table 12 reports the lift by
campaign and format. We observe a dramatic effect across all formats for the new brand in
campaign 12. Similarly, we find large and significant effects for campaign 21 (banners, rich
26
-
7/28/2019 Internet vs TV 2
27/33
Table 12: Adjusted lift by campaign and format. Asterisk denotes significance at the 5% level.
banner rich media video TVcampaign 1 0.10 -0.05 0.07 0.11
campaign 2 0.13 0.00 0.06 0.05
campaign 3 0.21
campaign 4 0.00 -0.01
campaign 5 -0.09 0.23
campaign 6 0.20 0.20 0.27
campaign 7 -0.08 -0.04 -0.05 -0.20
campaign 8 0.20
campaign 9 0.54 0.15
campaign 10 -0.02 0.05 0.33 0.02
campaign 11 0.30 0.31 0.48 0.43 campaign 12 0.17 0.06
campaign 13 0.27 0.05
campaign 14 0.15 0.18
campaign 15 0.20
campaign 16 0.38 0.07
campaign 17 0.37
campaign 18 0.30 0.12 0.39
campaign 19 0.33
campaign 20 0.28 0.41 0.22
27
-
7/28/2019 Internet vs TV 2
28/33
Table 13: Aggregate lift comparison across formats.
avg. lift t-value p-value
banner 0.17 1.56 0.12TV 0.14rich media 0.12 0.38 0.71TV 0.10video 0.19 1.60 0.11TV 0.14
media, and TV), campaign 19 (banners and TV), campaign 15 (banners and TV), campaign
7 (banners and TV), and campaign 6 (TV and video). Some campaigns have a much greater
banner lift than TV (e.g., 9 and 16). Rich media provides the highest lift in campaign
20. Video outperforms other formats in campaigns 10 and 13. These differences suggest
more exploration is needed when data become available for a larger number of campaigns
in order to enable the establishment of a relationship between campaign characteristics and
the effectiveness of the media vehicles.
Comparing the average performance of the advertising formats across the campaigns, we
find all Internet formats perform slightly better than TV, with video having the highest
relative lift at 0.05, banners at 0.03, and rich media at 0.01. However, the p-values for video
and banners are only 0.11 and 0.12. Thus, accounting for the differences in pre-existing
brand knowledge by format leads to a different inference regarding the relative performance
of TV versus the online advertising formats. By only comparing in-flight recall rates, TV
appears to be the most impactful medium, but adjusting for the initial conditions, Internet
formats perform just as well or perhaps even better.
One question that arises in comparing in-flight lift across campaigns is whether respon-
dents were exposed the same number of times across campaigns at the time they are surveyed.
Table 14 reports the average number of exposures to the surveyed format for each Internet
versus TV comparison. Exposure rates among the banner- and rich-media exposed/surveyed
28
-
7/28/2019 Internet vs TV 2
29/33
Table 14: Average number of exposures to focal format vs. TV at the time a respondent takes asurvey in focal format.
exposures t-value p-valuebanner 2.57 4.91 0.00TV 1.86rich media 2.58 3.78 0.00TV 1.86video 2.24 1.40 0.16TV 2.01
Table 15: Adjusted lift by number of exposures prior to survey for the pairwise comparison toTV.
banner rich media video1 exposure 0.01 -0.04 0.062 exposures 0.01 0.07 0.083 exposures 0.03 0.03 -0.084 exposures -0.02 -0.03 -0.085 exposures -0.14 -0.18 -0.186 exposures 0.23 0.13 -0.07avg. diff. in lift 0.03 0.01 0.05
Note: denotes significance at the 10% level, significance at the 5% level.
are significantly greater than TV exposures for the same campaigns. Recall, however, that
not all TV exposures are observed. Video exposures are also greater than TV exposures,
though the difference is not statistically significant.
Table 15 reports the difference in lift between TV and each Internet format separately by
the total number of exposures to the campaign. Once we condition on exposures, we do not
see any format performing systematically better. Even for a given exposure level, only one
comparison yields a significant result (banner lift 0.23 greater than TV at six exposures).Overall this finding suggests that the ability of Internet exposures to produce lift measures
comparable to those of TV is not due to systematically greater numbers of exposures to the
campaign.
29
-
7/28/2019 Internet vs TV 2
30/33
6 Conclusions
In this research, we propose a methodology for establishing a format-specific baseline to
assess the lift in brand recall due to an advertising campaign. We supplement the in-flight
brand-message surveys with a set of pre-campaign surveys and match the pre-campaign
respondents to those eventually exposed to the campaign in order to control for pre-existing
brand knowledge. The rich data set we have, tracking the response to TV and Internet
advertising for 20 campaigns across a variety of industries, provides us with comparable
measures to assess the relative performance of the different advertising formats.
We find a systematically lower level of brand knowledge among individuals who are
surveyed about banner and rich media. Without a format-specific baseline, a researcher
might therefore draw the wrong conclusion and ascribe too much importance to TVs effect
on brand recall. Once the difference in pre-existing knowledge is taken into account, there
is no significant difference in the effectiveness of TV and Internet ads in terms of correct
brand identification. This result underscores the importance of pre-campaign surveys and
our matching methodology for comparing ad performance across media formats.
The goal of our research was to assess the widely held belief that TV outperforms Internet
formats as a brand-building platform, and we therefore focused on head-to-head comparisons
of TV to the Internet formats. Nevertheless, as advertisers decide how to use these various
formats, knowledge of the complementarities between media will be important. Researchers
in marketing have long explored the potential synergies in multimedia communications (see,
e.g., Naik & Raman (2003) or Dijkstra, Buijtels & van Raaij (58) for recent examples)but the empirical study of the phenomenon in a field setting is still challenging. Studies
that randomly vary TV and Internet pulses across geographic markets may be best suited
to disentangle the optimal combination and sequencing of ad formats. This more detailed
analysis was not possible in our context, where most advertising was at the national level, so
a focus on brands involved in geo-targeted campaigns may be the most promising approach.
Another avenue for future research would be to investigate more formally the link between
30
-
7/28/2019 Internet vs TV 2
31/33
category characteristics and effectiveness of different types of campaigns. Our study and the
existing literature on advertising point to a number of potentially relevant brand and category
factors such as the maturity level of the category, the stage of the product life cycle (new
introduction versus established brand), and the amount of previous advertising, possibly as
share of voice in the category. In addition, the type of consumer decision making in the
product category - whether it is a low-involvement or a high-involvement process will also
likely play a role in determining what media format will be most effective.
Finally, our research can be extended by practitioners to include cost measures in com-
paring the relative performance across ad formats and guiding the media budget allocation
decisions. As of now, online advertising still appears to be more cost effective. We anticipate
though that once the brand-building potential of Internet formats has been firmly estab-
lished, the prices for online advertising will increase to reflect their relative performance.
31
-
7/28/2019 Internet vs TV 2
32/33
References
Abadie, A., Drukker, D., Herr, J. & Imbens, G. (2004). Implementing matching estimators
for average treatment effects in Stata, The Stata Journal4(3): 290311.
Abadie, A. & Imbens, G. (2012). Bias-corrected matching estimators of average treatment
effects, Journal of Business and Economic Statistics29(1): 111.
Assmus, G., Farley, J. U. & Lehmann, D. (1984). How advertising affects sales: Meta analysis
of econometric results, Journal of Marketing Research 21(1): 6574.
Briggs, R. & Hollis, N. (1997). Advertising on the web: Is there response before click-
through?, Journal of Advertising Research pp. 3345.
Dijkstra, M., Buijtels, H. & van Raaij, F. (58). Separate and joint effects of medium type
on consumer responses: A comparison of television, print, and the internet, Journal of
Business Research2005(3): 377386.
Dreze, X. & Hussherr, F.-X. (2003). Internet advertising: Is anybody watching?, Journal of
Interactive Marketing17(4): 823.
Franzen, G. (1994). Advertising Effectiveness: Findings from empirical research, NTC Pub-
lications, Henley-on-Thames, U.K.
Goldfarb, A. & Tucker, C. (2011). Online advertising, Advances in Computers, Vol. 81,
Elsevier.
Heckman, J., Ichimura, H. & Todd, P. (1997). Matching as an econometric evaluation
estimator: Evidence from evaluating a job training programme, Review of Economic
Studies64: 605654.
Hutchinson, W. & Moore, D. (1984). Issues surrounding the examination of delay effects of
advertising, in T. Kinnear (ed.), Advances in consumer research, Vol. 11, Provo, UT:
Association for Consumer Research, pp. 650655.
32
-
7/28/2019 Internet vs TV 2
33/33
Imbens, G. (2004). Nonparametric estimation of average treatment effects under exogeneity:
A review, The Review of Economics and Statistics 86(1): 429.
Krishnan, S. & Chakravarti, D. (1999). Memory measures for prestesting advertisements:
An integrative conceptual framework and a diagnostic template, Journal of Consumer
Psychology8(1): 137.
Lewis, R. & Reilley, D. (2011). Does retail advertising work?, Technical report, Yahoo!
Research.
Lodish, L. M., Abraham, M., Kalmenson, S., Livelsberger, J., Lubetkin, B., Richardson, B.
& Steve, M. E. (1995a). How advertising works: A meta-analysis of 389 real world split
cable tv advertising experiments, Journal of Marketing Research 32: 125139.
Lodish, L. M., Abraham, M., Kalmenson, S., Livelsberger, J., Lubetkin, B., Richardson, B.
& Steve, M. E. (1995b). A summary of fifty-five in-market experimental estimates of
the long-term effects of advertising, Marketing Science14(3): G13340.
Naik, P. & Raman, K. (2003). Understanding the impact of synergy in multimedia commu-
nications, Journal of MArketing Research40(4): 375388.
Rossiter, J. & Bellman, S. (2005). Marketing Communications: Theory and Applications,
Pearson Education.
Wells, W. (1964). Recognition, recall and rating scales, Journal of Advertising Research
4(3): 28.