internet vs tv 2

Upload: punit149

Post on 03-Apr-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Internet vs TV 2

    1/33

    Internet vs. TV Advertising: A Brand-Building

    Comparison

    Michaela Draganska

    The Wharton School

    Wesley R. Hartmann

    Stanford GSB

    Gena Stanglein

    Google

    Abstract

    A key issue for media planners determining the share of their advertising budgetsto spend on Internet advertising is whether Internet advertising can build brands aseffectively as television advertising. To address this question, we extend traditionalbrand-message recall measurement to facilitate comparisons between Internet formatsand television. Specifically, we supplement brand-message surveys conducted during thecampaign with a set of pre-campaign surveys to control for pre-existing brand knowl-

    edge, and use a matching procedure to ensure the pre-campaign sample is comparableto the in-flight one.

    For our analysis, we use a rich data set comprising 20 campaigns, across multipleindustries ranging from consumer packaged goods to telecommunications. We findsubstantial cross-brand variation in pre-existing knowledge as well as variation acrossadvertising formats. In particular, individuals exposed to Internet display ads havesignificantly lower levels of pre-existing brand knowledge than television viewers. Suchdifferences in initial conditions suggest biases in comparisons between Internet andtelevision ads, and possibly a more general failure of the brands to establish lastingassociations among individuals shifting media consumption from TV to the Internet.Incorporating these pre-existing differences between media formats results in brandlift measures for Internet ads that are statistically indistinguishable from comparabletelevision lift measures.

    Keywords: advertising, display, television, Internet.

    The authors would like to thank Oscar Mitnik and Amogh Vasekar for valuable assistance, as well asRawley Cooper, Brent Davis and Scott McKinley at Nielsen for their help in executing the study. Draganskaand Hartmann served as consultants during the survey design and administration phases of the project.

    Email: [email protected]: [email protected]: [email protected]

    1

  • 7/28/2019 Internet vs TV 2

    2/33

    1 Introduction

    Over the past decade advertising expenditures have shifted from traditional media to the

    Internet. In 2011, online advertising in the United States alone reached $32 billion and is

    projected to reach $62 billion by 2016 (eMarketer, February 2012 report). Internet portals

    determined to use their inventory to substitute for traditional advertising formats have turned

    to quantitative metrics to illustrate the advantages of online advertising. They are armed

    with the ability to readily observe behavioral responses on the web, such as click-through

    rates, and to conduct large-scale online experiments to provide the most accurate measure of

    the effectiveness of the ads in driving consumer purchasing decisions (Lewis & Reilley 2011,

    Goldfarb & Tucker 2011).

    Nevertheless, many advertisers still hesitate to shift spending from television campaigns

    to the Internet, pointing to the established role of TV advertising in building brands. The

    solid experimental evidence quantifying the behavioral response to Internet advertising does

    not seem to be a sufficient reason, because no direct comparison to the effectiveness of TV

    as a brand-building medium is available. In general, TV experiments are costly and thus not

    scalable for wide-spread application to allow for a comparative study of the effectiveness of

    online and offline campaigns. Older experimental studies on TV advertising, most notably

    by Lodish, Abraham, Kalmenson, Livelsberger, Lubetkin, Richardson & Steve (1995a) and

    Lodish, Abraham, Kalmenson, Livelsberger, Lubetkin, Richardson & Steve (1995b) do not

    have Internet data. Considering also the typical sample sizes for such experiments, obtaining

    a significant effect of advertising on sales frequently fails due to lack of power (Lewis &Reilley 2011), so we cannot really rely on them.

    For that reason and because of the perception that TV advertising is the main medium

    for brand building, the metrics typically used to assess its effects are brand awareness and

    preference. The rationale is that, although direct links to eventual purchase are sometimes

    possible, brand advertising on television is primarily aimed at influencing the mindset of a

    customer who may purchase anytime within a reasonably long horizon (Assmus, Farley &

    2

  • 7/28/2019 Internet vs TV 2

    3/33

    Lehmann 1984). By contrast, the effect of online advertisements has been measured mostly

    on outcomes such as click-through rates and generated sales.

    A few recent studies have questioned the emphasis on sales measures and have pointed to

    the brand-building potential of the Internet (Briggs & Hollis 1997, Dreze & Hussherr 2003).

    In one of the earliest studies of online advertising, Briggs & Hollis (1997) show that banner

    ads can also have an effect on brand awareness and image, even in the absence of a behavioral

    response such as a click-through. Using eye-tracking devices in conjunction with a large-scale

    survey, Dreze & Hussherr (2003) find that consumers avoid looking at banners, but there is

    still an effect on brand recall measures, suggesting a pre-attentive level of processing. This

    research implies that attitudinal measures may be more appropriate not just for assessing

    the effectiveness of TV commercials but that of online advertising as well.

    To date, however, no field data have been available to enable media planners to conduct

    an apples-to-apples comparison of the advertising effectiveness of online and offline media in

    terms of creating brand awareness and establishing brand associations. This paper seeks to

    fill this gap and measure Internet advertising performance, specifically the performance ofvarious non-search advertising formats, according to the metrics advertisers have historically

    relied on for their television campaigns.1

    We have a unique data set of 20 advertising campaigns spanning a wide variety of product

    categories and industries. In addition to TV commercials, we have data for Internet banner,

    rich media and video ads. The advertising campaigns use the online and TV advertising

    formats concurrently, and the effect of the commercials is assessed using the same brand-

    recall measure (ability of respondent to correctly link creative to brand) for all advertising

    formats, thus providing the data for a valid comparison of the effectiveness of the different

    media.

    1At the start of 2011, Google, The ARF, Nielsen, Stanford, and Wharton collaborated on an initiative toenhance the media planning and buying process. The goal was to quantify cross-media ad-format effective-ness, and derive the relative impact of ad formats. The first phase of this project was a pilot measuring thebrand cut-through (i.e., ability of consumers to correctly link a brand to a creative) of ad formats across adcampaigns.

    3

  • 7/28/2019 Internet vs TV 2

    4/33

    The performance of an ad campaign is gauged relative to a baseline and is referred to

    as lift. One could posit, as is common in the industry, that absent advertising consumers

    would randomly associate brands with commercials. However, especially for mature brands,

    assuming consumers do not have any pre-existing brand knowledge due to exposure to past

    advertising, word of mouth, or other experiences with the brand is naive. In addition, for us

    to compare advertising effects for a given campaign across formats, potential customers who

    extensively use the Internet and those who predominantly watch TV need to have the same

    level of pre-existing familiarity with the brand. If the existing stock of past advertising differs

    by media behavior and is thus dependent on the type of ad format, to which an individual is

    likely to be exposed, using a constant baseline across formats would no longer yield a valid

    comparison.

    To account for such potential disparities in the pre-existing familiarity with the brand

    across media, we have modified the traditional television recall methodology to include a pre-

    campaign survey to obtain the initial conditions of the advertising stocks for consumers

    with different media-consumption habits. We avoid a testing bias by employing a repeatedcross-section design rather than a true panel; that is, we measure pre-campaign brand recall

    and in-flight (during the campaign) brand recall for separate sets of consumers.

    We ensure the comparability of the pre-campaign and in-flight survey groups by employ-

    ing a nearest-neighbor matching procedure (Abadie & Imbens 2012). This technique allows

    us to select only those individuals from the pre-campaign sample who exhibit media con-

    sumption behavior similar to that of the individuals surveyed during the campaign. Having

    this pre-campaign measure for an equivalent group gives us a much more accurate baseline

    to establish the lift of a campaign relative to assuming random guessing as is typically done

    in the industry.

    We find substantial differences in the pre-existing levels of brand knowledge both across

    campaigns and across advertising formats. In particular, respondents who were exposed

    predominantly to the Internet formats had a lower level of pre-existing brand knowledge

    4

  • 7/28/2019 Internet vs TV 2

    5/33

    than TV viewers. Our analysis further reveals that ignoring the initial conditions results

    in different conclusions regarding the relative effectiveness of TV versus online formats.

    Comparing the impact of the three online formats - banner ads, rich media, and video - to

    commercials aired on TV using the traditional measure of in-flight brand recall, we find TV is

    superior to the Internet. Upon adjusting for the pre-existing differences in brand knowledge

    by format, however, we find that Internet ad performance is statistically indistinguishable

    from TV.

    In the next section, we provide some background on the traditional brand-recall measures

    and define the conditions under which the methodology can be interpreted causally. Section

    3 explains the data-collection procedure and provides a description of the variables used in

    the analysis. We proceed by outlining our empirical strategy in section 4 and then present

    the findings in section 5. Section 6 concludes with directions for future research.

    2 Traditional Recall Methodology

    A long-established practice in advertising research is to survey individuals who were exposed

    to an ad the previous day to determine the extent to which they recall the ad message, the

    brand, and can link the message to the corresponding brand (Rossiter & Bellman 2005).

    Although these attitudinal measures only approximate the effect on purchase behavior that

    advertisers are ultimately interested in, they have gained wide acceptance and usage. In

    an early study, Wells (1964) compared the different recognition, recall and rating scales

    employed in practice and concluded that recall scores, which reflect the advertisements

    ability to register the sponsor name and to deliver a meaningful message to the consumer,

    are particularly trustworthy. More recently, Krishnan & Chakravarti (1999) review existing

    memory tests for assessing advertising effectiveness and underscore their value across a wide

    range of advertising objectives.

    The ad message could inform the viewer about the existence or functional attributes of

    the brand, or establish non-functional brand associations. It produces memory traces about

    5

  • 7/28/2019 Internet vs TV 2

    6/33

    brand-specific and message-specific information, about the product category, evaluative re-

    actions, and brand identification (Hutchinson & Moore 1984). For the message to have an

    effect, the consumer needs to know which brand is being advertised. Empirical studies have

    shown that this is a nontrivial task, as only about 40% of consumers who have viewed a

    commercial recall the sponsor of the message (Franzen 1994). Establishing brand-message

    links therefore is a critical input to brand building. In the present research, we focus specif-

    ically on the question of whether a respondent prompted with a description of the ad can

    recall the brand.

    Message-recall studies have a few reasons for selecting individuals who viewed the ad the

    day before. First, the goal is to assess the creatives ability to link the brand and message,

    and not necessarily to assess the quality of the message itself. For example, one could

    imagine assessing recall several days after the individuals view the ad to see whether the

    ad sticks. This measure, however, says more about the memorability of the message than

    the creatives ability to help the brand cut through and get viewers attention. Second,

    exposure is traditionally inferred based on self-reported viewing of a TV program duringwhich a commercial was aired (opportunity to see). That is, respondents have been required

    to recall their program viewership to establish exposure. Doing so for a longer period of

    time can result in too much error. Passive measurement of exposure, for example, through

    a meter installed on the TV set or through other tracking devices, can resolve this problem,

    but such measurement is not available at a large enough scale for all advertising.

    To describe the traditional methodology, we begin by introducing some notation. Let

    Ys be an indicator for whether respondent s can correctly select the brand from a multiple-

    choice list after being prompted with a description of the advertising message. Xs is an

    indicator for whether respondent s was exposed to the ad. Finally, let ys0 be a probabilistic

    assessment of how well respondent s could have guessed the associated brand before the ad

    was run. Then the traditional recall methodology defines = E[Y y0|X = 1] to be the

    expected lift among the exposed population in linking the brand with the message. The

    6

  • 7/28/2019 Internet vs TV 2

    7/33

    estimator for is

    =

    {s|Xs=1}

    (Ys ys0) ws, (1)

    where the summation conditions on respondents who have been exposed to the ad and ws

    weighs respondents based on how representative they are of the entire population exposed

    to the ad.2 If respondents are randomly drawn from the exposed population, there is no

    need for the weight ws. This term arises here because of selection issues market research

    organizations have in recruiting their panels.

    The baseline ys0 captures consumer past interactions with the brand and provides a

    measure of the extent to which its advertising has established an association between brand

    and message. One might expect successful brands to already have a reasonably high baseline

    association with the message because the message is probably related to associations they

    have previously communicated. On the other hand, a new brand may have no pre-existing

    associations that could be tied to the message, resulting in a small baseline.

    In practice, ys0 is typically a predetermined constant, such as the success rate of guessing

    at random, that is the same for all respondents. Obtaining a more accurate measure of the

    baseline ys0 by establishing an initial condition for the campaign is important both for the lift

    measurement above as well as for providing the advertiser with information about how well

    past campaigns have imprinted a brand image. Furthermore, if is to be compared across ad

    formats, recognizing that individuals exposed to different formats may systematically differ

    in their level of pre-campaign associations is critical.

    The traditional recall methodology can be characterized as trying to measure a treatmenteffect on the treated population (Heckman, Ichimura & Todd 1997, Imbens 2004). This

    2The canonical message-recall measure focuses on assessing a single airing of an ad. Yet practitionersoften group together multiple ads in a day as well as ads aired across multiple weeks of the campaign.Nevertheless, the estimator in equation (1) is still applied, but the meaning may change because responsesfor ads later in the campaign could involve more campaign exposures than those responses for ads earlierin the campaign. Practitioners have attempted to account for multiple campaign exposures by consideringthe build and/or decay in the brand associations throughout the campaign. With enough surveys, onecould repeat the above analysis at each point in time, but more often the researcher tries to estimate howthe responses vary with how far along the campaign is in terms of either time or total exposures.

    7

  • 7/28/2019 Internet vs TV 2

    8/33

    terminology arises from the focus on only measuring the effect for those individuals who

    were exposed, that is, conditioning on X = 1 in = E[Y y0|X = 1].

    The primary challenge to a causal interpretation of recall studies is the establishment

    of a control condition. Because the same individual cannot be simultaneously exposed and

    unexposed, measuring ys0 for a respondent who is exposed is typically impossible. To clarify

    the problem, we separate the lift measure, = E[Y y0|X = 1], into two independent

    expectations: = E[Y|X = 1] E[y0|X = 1]. The first component of this expression,

    the probability of correctly identifying the brand if an individual was exposed to the ad

    campaign, E[Y|X = 1], can be easily obtained from observed recall and exposure data. The

    latter component, E[y0|X = 1], requires assessing the control outcomes, that is assessing

    whether a respondent would have correctly linked the creative to the brand without seeing

    the ad. This measurement requires experimentation and presents a particular challenge

    for media such as television. In section 4 we propose a method to obtain this measure by

    augmenting the traditional methodology described above with a pre-campaign survey. Before

    we proceed, we first introduce the data set in the next section.

    3 Data

    3.1 Data Collection and Variables

    The data-collection effort employs Nielsens TV Brand Effect panel. This panel consists of

    a large number of participants who reveal their advertising exposures across Internet and

    television formats and answer creative and brand-recall survey questions on rewardtv.com.

    The panel consists of more than six million registered members, with a weekly average

    of 26,000 participants. On average, a panelist would visit the rewardtv.com site 1.5 times a

    week and take 1.7 surveys per visit. Approximately 83% of panelists are new each month.

    Because we rarely observe the same individual for longer stretches of time, the panel is best

    considered a repeated cross section, which limits our ability to make before-after comparisons

    of the same individual. However, repeatedly asking a respondent about the same ad and set

    8

  • 7/28/2019 Internet vs TV 2

    9/33

    of brands may lead to conditioning effects (testing bias), so not having a long time series is

    not necessarily a negative feature in this setting.

    Nielsen recruits panelists across various Internet portals and sites and through word of

    mouth. To maximize daily participation, the site provides a lot of entertainment content,

    along with sweepstakes, auctions, and discounts. The incentives are soft, though, thus

    ensuring a high turnover and minimizing the potential for conditioning effects. Nielsen

    conducts periodic checks to ensure the panelists exhibit the same TV viewing and Internet

    usage behavior as other Nielsen panelists, and uses weights to ensure the representativeness

    of each surveyed individual. The first survey for new panelists is eliminated to allow for a

    training/experimentation period, and any abnormal participation and response patterns are

    carefully examined.

    In addition to TV commercials, we investigate three online formats - banner ads, rich

    media, and video. Banner ads are ads with or without animation with which the user

    cannot interact. Examples include overlays on video content, companion banners, wallpapers,

    and skyscraper. Video is any streaming video, pre-roll, post-roll, or in-roll. Rich media areany ads with which the mouse can interact without necessarily activating a click-through,

    such as expandable ads, interactive game ads, and corner peels.

    To record online ad exposure, online ad creatives are tagged and then linked to the

    panelists via cookies on their computers. Provided cookies are not erased and the user does

    not change computers, Internet exposures are complete for the duration of the campaign

    irrespective of an individual logging in on rewardtv.com. Television exposure is inferred

    when a respondent logs in to rewardtv.com and states that on the preceding day, she watched

    a program that is known to have run an advertisement from the campaign (opportunity to

    see). For TV exposures, we thus do not observe exposures an individual may have had prior

    to logging onto rewardtv.com.

    When an individual logging in is identified as having been exposed to an ad, she is

    presented with a description of a scene from a commercial (for an example, see Table 1).

    9

  • 7/28/2019 Internet vs TV 2

    10/33

    This description often comes in the form of a question assessing whether the respondent can

    recall the creative. Next, the respondent is asked to indicate which of four listed brands the

    commercial was for.

    Table 1: Example for creative-recall and brand-recall questions

    In a commercial during this show, who spoke directly to the cameraand said, I just bought stock you just saw me buy stock, as he satat a computer keyboard?

    Well-spoken baby who eventually spat up all over the place Monkey wearing a custom-tailored suit and a fine silk tie

    Simple peasant from the past who came from a rural village

    Alien from outer space who did not speak earth language

    What was this a commercial for?

    E Trade

    TD Ameritrade

    Scottrade

    Charles Schwab

    Questions are asked in the same way for all formats. Brand recall, however, is only

    measured conditional on creative recall in the case of TV, as opposed to rich media, video,

    and banner ads, where all responses are recorded. To keep the data comparable, we retain

    individuals who answered the creative-recall question correctly for all formats. The sample

    sizes by format and campaign are reported in Table 2.

    We collected data for 20 advertising campaigns run in 2011 across several industries: tele-

    com, food and beverage, beauty, financial services, and pharmaceuticals. For confidentiality

    reasons, we cannot share the brand names that were advertised, but Table 3 gives some

    information about each campaign and the brand advertised. We see that the campaigns

    vary considerably in terms of duration, with the shortest campaign being four weeks and the

    10

  • 7/28/2019 Internet vs TV 2

    11/33

    Table 2: Sample sizes for in-flight sample (first number) and unmatched pre-campaign sample(second number) by survey question format and campaign.

    banner rich media video TVcampaign 1 307/1723 93/3419 92/1695 1893/1729campaign 2 225/2313 36/2339 141/2312 3909/2327campaign 3 334/919campaign 4 157/3562 721/3546campaign 5 78/1199 2338/1199campaign 6 146/826 239/804 366/807campaign 7 468/1023 90/1031 83/1068 2518/3348campaign 8 2269/5966campaign 9 78/1225 959/1255

    campaign 10 189/1400 245/1320 84/1409 1955/2935campaign 11 258/2135 467/2123 131/2123 3875/3396campaign 12 820/1980 407/1083campaign 13 75/957 352/964campaign 14 53/1277 380/1254campaign 15 87/971campaign 16 73/1271 2426/1602campaign 17 1108/438campaign 18 1386/1570 57/1425 3658/1453campaign 19 2118/2241campaign 20 53/1723 36/3419 648/1729

    11

  • 7/28/2019 Internet vs TV 2

    12/33

    longest, 36 weeks.

    Table 3: Duration of advertising campaigns, penetration of advertised brand in its respectiveproduct category, and share of TV GRPs for the four quarters prior to current campaign.

    TV weeks online weeks penetration TV GRP sharecampaign 1 12 15 0.33 0.48campaign 2 8 25 0.33 0.47campaign 3 8 8 0.08 0.36campaign 4 8 8 0.15 0.00campaign 5 10 10 new 0.16campaign 6 19 19 0.02 0.32campaign 7 32 32 0.12 0.22

    campaign 8 36 36 0.19 0.55campaign 9 12 12 0.21 1.00campaign 10 27 27 0.17 0.45campaign 11 12 30 new 0.00campaign 12 12 30 0.03 0.68campaign 13 4 4 0.19 0.15campaign 14 6 6 0.35 0.30campaign 15 4 4 0.01 0.43campaign 16 14 14 0.23 0.44campaign 17 8 8 0.12 0.16

    campaign 18 14 14 0.12 0.08campaign 19 7 7 0.36 0.31campaign 20 11 11 0.36 0.27

    The percentage of US households who are buying a certain CPG brand or using a service

    (penetration of the brand) varies widely across campaigns: we have a new brand (campaign

    11), a new line extension (campaign 5), along with several category leaders with a high

    penetration of more than 30% (campaigns 1, 2, 14, 19 and 20). The level of advertising

    in the four quarters prior to the current campaign also exhibits substantive variation: from

    non-existent (campaigns 4 and 11) to 100% of the TV GRPs in the category for campaign

    9.

    3.2 Recall Measures: In-Flight Sample

    The brand-recall analysis as described in section 2 consists only of respondents correct or

    incorrect associations of the brand with the message. To collect these data, Nielsen deploys

    12

  • 7/28/2019 Internet vs TV 2

    13/33

    surveys while an ad campaign is running. When individuals report they have viewed a TV

    program that aired a commercial for the focal campaign or when they have visited a web page

    featuring an online ad, they are presented with the brand-recall question. Table 4 displays

    the average of the responses from the in-flight survey by campaign and format. Because

    these estimates do not include an adjustment for a baseline response, they are calculated as

    in equation (1), except that ys0 is set to zero:

    {s|Xs=1}

    Ysws.

    Table 4: Percentage of correct linkages of brand and creative across formats and campaigns for

    all individuals surveyed in-flight. Standard deviations are reported in parentheses.

    banner rich media video TVcampaign 1 0.40 (0.49) 0.30 (0.46) 0.39 (0.49) 0.40 (0.49)campaign 2 0.39 (0.49) 0.24 (0.43) 0.41 (0.49) 0.37 (0.48)campaign 3 0.53 (0.50)campaign 4 0.37 (0.48) 0.35 (0.48)campaign 5 0.15 (0.35) 0.31 (0.46)campaign 6 0.44 (0.50) 0.43 (0.50) 0.41 (0.49)campaign 7 0.35 (0.48) 0.34 (0.48) 0.44 (0.50) 0.42 (0.49)campaign 8 0.79 (0.41)

    campaign 9 0.85 (0.36) 0.78 (0.42)campaign 10 0.51 (0.50) 0.50 (0.50) 0.80 (0.40) 0.68 (0.47)campaign 11 0.36 (0.48) 0.36 (0.48) 0.59 (0.49) 0.49 (0.50)campaign 12 0.58 (0.49) 0.48 (0.50)campaign 13 0.38 (0.49) 0.18 (0.39)campaign 14 0.48 (0.50) 0.84 (0.37)campaign 15 0.34 (0.47)campaign 16 0.60 (0.49) 0.48 (0.50)campaign 17 0.55 (0.50)campaign 18 0.46 (0.50) 0.55 (0.50) 0.55 (0.50)campaign 19 0.53 (0.50)

    campaign 20 0.39 (0.49) 0.49 (0.51) 0.38 (0.49)

    Looking at the average brand recall rates in Table 4, we see many substantial brand-

    message links. There is also substantial variation both across formats and campaigns. Al-

    though the numbers in the table cannot be directly interpreted as a lift measure because

    the baseline has not been removed, we can subtract the one traditionally used in practice,

    ys0 = 0.25, from the reported numbers to get an estimate of the lift. It is notable that

    13

  • 7/28/2019 Internet vs TV 2

    14/33

    although many campaigns have a positive lift, quite a few format-campaign combinations

    (e.g., banners in campaign 5, rich media in campaign 2, and TV in campaign 13) are below

    the baseline of 0.25. These numbers could be indicative of a poor campaign that broke pre-

    viously established brand-message links, or as we will explore with our initial conditions

    methodology cases, in which the baseline should actually be lower.

    To formally assess the differences between recall rates for Internet formats and television,

    we aggregate across campaigns. Table 5 reports the results of comparing the average recall

    rates for campaigns that used Internet formats to the recall rates for TV for these campaigns.

    For campaigns that ran some banner ads, the average brand-message recall of banners is 0.45,

    whereas it is 0.50 for TV ads, with the difference having a p-value of 0.01. Similarly, among

    the campaigns running rich media, the recall is 0.37 for rich media, but significantly greater

    at 0.46 for TV. The video ads recall is significantly greater than TV (0.50 versus 0.44) in

    those campaigns airing some video ads. Based on these data, we might therefore conclude

    that TV outperforms banner ads and rich media in terms of brand recall, whereas video

    outperforms TV.

    Table 5: Comparison of average recall rates for Internet formats vs. TV across campaigns inin-flight sample. Campaigns that do not use a given online format were excluded.

    avg. recall t-stat p-valuebanner 0.45 -3.24 0.01TV 0.50rich media 0.37 -3.92 0.00TV 0.46video 0.50 2.88 0.00TV 0.44

    3.3 Recall Measures: Pre-Campaign Sample

    For this research project, we augmented the in-flight data collection with a set of surveys,

    which were deployed before the advertising campaign was run, to account for pre-existing

    differences in respondents abilities to link the brand and message. As we describe in sec-

    14

  • 7/28/2019 Internet vs TV 2

    15/33

    tion 4, these pre-campaign surveys can be used to measure more accurately the lift the ad

    campaign provides relative to an initial condition than by simply assuming that, absent

    advertising, consumers would randomly guess.

    Table 6: Percentage of correct linkages of brand and creative across formats and campaigns inpre-campaign survey sample. Standard deviations are reported in parentheses.

    banner rich media video TVcampaign 1 0.38 (0.48) 0.39 (0.49) 0.39 (0.49) 0.37 (0.48)campaign 2 0.35 (0.48) 0.35 (0.48) 0.38 (0.48) 0.36 (0.48)campaign 3 0.49 (0.50)

    campaign 4 0.27 (0.44) 0.29 (0.45)campaign 5 0.18 (0.39) 0.17 (0.37)campaign 6 0.26 (0.44) 0.25 (0.44) 0.27 (0.45)campaign 7 0.48 (0.50) 0.43 (0.49) 0.47 (0.50) 0.53 (0.50)campaign 8 0.54 (0.50)campaign 9 0.41 (0.49) 0.50 (0.50)campaign 10 0.45 (0.50) 0.41 (0.49) 0.47 (0.50) 0.47 (0.50)campaign 11 0.10 (0.30) 0.09 (0.29) 0.10 (0.31) 0.08 (0.28)campaign 12 0.43 (0.50) 0.40 (0.49)campaign 13 0.16 (0.37) 0.17 (0.38)campaign 14 0.31 (0.46) 0.34 (0.47)

    campaign 15 0.23 (0.42)campaign 16 0.30 (0.46) 0.34 (0.47)campaign 17 0.33 (0.47)campaign 18 0.17 (0.38) 0.18 (0.39) 0.19 (0.39)campaign 19 0.27 (0.44)campaign 20 0.24 (0.43) 0.23 (0.42) 0.26 (0.44)

    Preliminary examination of the average pre-campaign brand-recall rates in Table 6 re-

    veals that the recall rates vary substantially across campaigns and that large deviations from

    a random guess rate of ys0 = 0.25 are present. As expected, the correct linkages for the new

    products (campaigns 5 and 11) are quite low. In line with our intuition, the preexisting

    brand knowledge for campaign 5, which is a line extension, is somewhat higher than the

    entirely new brand in campaign 11. Campaign 18, which has a low share of TV GRPs (8%),

    is also characterized by a low level of creative-brand association. By contrast, campaigns

    with a relatively high penetration and share of TV GRPs have higher creative-brand associ-

    15

  • 7/28/2019 Internet vs TV 2

    16/33

    ations. We do not have enough data to fully document a relationship between the campaign

    characteristics and the probability of correctly linking a creative to a brand, but sufficient

    evidence exists to suggest that subsequent analyses should account for, and possibly attempt

    to explain, the presence of systematic variation.

    Table 7: Comparison of average recall rates for Internet formats vs. TV across campaigns inpre-campaign survey sample. Campaigns that do not use a given online format were excluded.

    avg. recall t-stat p-valuebanner 0.31 -3.82 0.01

    TV 0.33rich media 0.32 -4.78 0.00TV 0.35video 0.30 -0.93 0.39TV 0.31

    One notable difference in the pre-campaign recall rates reported in Table 6, relative to

    the in-flight recall rates in Table 4, is that much less variation is present across formats.

    This lack of variation is to be expected because the differences across formats in Table 6

    are only in the question asked, not in the respondents past or future exposure to a given

    format (the questions were asked before the campaign had begun, so the respondents could

    not have been exposed to the ad.

    Table 7 reports a direct comparison of average recall for each Internet format to that for

    TV. Both banners and rich media perform slightly worse relative to TV (a difference of -0.02

    for banners and -0.03 for rich media), whereas video is statistically indistinguishable from

    TV.

    3.4 Comparison of In-Flight and Pre-Campaign Samples

    For the summary statistics of the pre-campaign sample to be considered a valid baseline

    to calculate the lift of a campaign, we need to ensure the respondents included in the pre-

    campaign sample are comparable to the ones surveyed during the campaign. This may not

    16

  • 7/28/2019 Internet vs TV 2

    17/33

    be the case, however, for a number of reasons. First, we are only interested in the effect of the

    campaign on the exposed individuals, and therefore respondents who are not exposed should

    receive a weight of zero in our analysis. Survey respondents in the pre-campaign sample

    by definition have not been exposed to the ad at the time they are surveyed, but their

    subsequent exposures (if any) have been recorded. We can therefore examine their various

    media exposures to verify they saw the commercial in the focal campaign and format. Table

    8 reports the percentage of the pre-campaign sample eventually exposed to an ad in the focal

    format and campaign. Although this percentage is quite high for TV, many pre-campaign

    respondents in the Internet formats were never exposed to the campaign. By contrast, all

    in-flight respondents have by definition been exposed. To make the samples comparable, we

    thus need to focus only on individuals who were eventually exposed (Xs = 1).

    Table 8: Percentage of pre-campaign sample who are eventually exposed to focal format (i.e.,were asked about the respective format).

    banner rich media video TVcampaign 1 0.70 0.47 0.54 0.83campaign 2 0.63 0.41 0.49 0.88campaign 3 0.96campaign 4 0.59 0.94campaign 5 0.53 0.94campaign 6 0.69 0.46 0.87campaign 7 0.70 0.48 0.40 0.85campaign 8 0.98campaign 9 0.63 0.94campaign 10 0.53 0.64 0.41 0.89campaign 11 0.50 0.73 0.46 0.82

    campaign 12 0.76 0.86campaign 13 0.60 0.91campaign 14 0.78 0.91campaign 15 0.98campaign 16 0.50 0.98campaign 17 1.00campaign 18 0.84 0.24 0.84campaign 19 0.99campaign 20 0.67 0.64 0.94

    17

  • 7/28/2019 Internet vs TV 2

    18/33

    A second issue is the extent to which the exposed pre-campaign sample and the in-flight

    sample are similar in terms of exposures to the different advertising formats. As can be seen

    by looking at the averages for both groups reported in Table 9, even those respondents who

    were eventually exposed to an ad in the focal campaign have a different rate of exposure

    than the respondents included in the in-flight sample. In general, those in the pre-campaign

    group have a much higher exposure to TV relative to the in-flight group.

    Thinking about what may explain these differences in media exposures, the different

    sampling time frames emerge as a possible cause. Whereas the in-flight surveys were collected

    for the entire duration of the campaign (anywhere between 4 and 36 weeks), the pre-campaign

    measures were typically collected within a week. To obtain the necessary sample size to

    ensure we would have an adequate group of individuals who are eventually exposed to the

    focal campaign, the selection of respondents had to be much more aggressive, thus yielding a

    potentially different sample. For example, the high TV exposures among the pre-campaign

    sample could be attributed to a greater number of professional survey takers that might

    have overstated TV exposure rates in order to earn more points on rewardtv.com. Using

    a matching methodology, we remove these outliers and create a sample comparable to the

    in-flight group.

    18

  • 7/28/2019 Internet vs TV 2

    19/33

    Table

    9:

    Averagenumberof

    exposurestodifferentadformatsbycampaign.Comparisonof

    pre-campaign(leftcolumn)and

    in-flight

    (rightcolumn)samples.

    banner

    ric

    hme

    dia

    vid

    eo

    TV

    pre-camp

    .

    in-fl

    ight

    pre-c

    amp.

    in-fl

    ight

    pre-c

    amp

    .

    in-flight

    pre-c

    amp

    .

    in-fl

    ight

    campa

    ign

    1

    5.6

    1

    4.7

    4

    3.2

    2

    2.9

    4

    2.4

    5

    1.9

    8

    10

    .75

    2.8

    7

    campa

    ign

    2

    3.7

    9

    3.0

    8

    2.0

    8

    1.6

    8

    2.8

    7

    2.0

    4

    16

    .67

    3.9

    4

    campa

    ign

    3

    21

    .96

    13

    .7

    campa

    ign

    4

    4.8

    9

    6.2

    7

    7.3

    4

    1.4

    8

    campa

    ign

    5

    3.1

    9

    3.4

    1

    13

    .91

    6.2

    3

    campa

    ign

    6

    5

    .7

    3.7

    2

    3.4

    4

    3.8

    2

    4.2

    9

    2.3

    1

    campa

    ign

    7

    6.5

    8

    5.6

    3.4

    2.8

    2

    6.8

    1

    4.0

    3

    11

    .51

    6.4

    1

    campa

    ign

    8

    1

    2.5

    7

    1.5

    6

    11

    .49

    campa

    ign

    9

    5.7

    9

    3.6

    8

    1.8

    5

    1.6

    7

    11

    .45

    2.6

    5

    campa

    ign

    10

    7.2

    6

    9.4

    7

    5.4

    8

    6.4

    6

    3.6

    5

    3.4

    8

    17

    .16

    2.5

    1

    campa

    ign

    11

    3.2

    8

    8.7

    8

    4.3

    9

    4.4

    9

    3.5

    7

    4.3

    8

    10

    .65

    2.5

    campa

    ign

    12

    4.1

    8

    5.5

    5

    4.2

    3

    1.4

    6

    campa

    ign

    13

    4.2

    5

    4

    .5

    4.3

    3

    2.5

    6

    campa

    ign

    14

    2.3

    1

    3.0

    2

    2.9

    5

    2.7

    campa

    ign

    15

    1.8

    5

    1.2

    3

    campa

    ign

    16

    2.3

    3

    1.8

    4

    11

    .54

    8.5

    5

    campa

    ign

    17

    5

    2.2

    campa

    ign

    18

    9.9

    1

    7.7

    6

    4.8

    3.2

    3

    22

    .89

    15

    .81

    campa

    ign

    19

    2.1

    7

    12

    .02

    campa

    ign

    20

    6.1

    2

    5.2

    2

    8.6

    3

    6.7

    7

    11

    .31

    9.4

    2

    19

  • 7/28/2019 Internet vs TV 2

    20/33

    4 Matching Methodology

    Given our pre-campaign survey data, we conceptualize lift as

    = E[Ys1 Ys0|X = 1] ,

    where Ys1 indicates correct association of the message and brand during the campaign by

    respondent s and Ys0 indicates correct association before the campaign.3 Numbering the sur-

    veys before the campaign as {1,...,S0} and those during the campaign as {S0 + 1,...,S1 + S0},

    we would ideally measure

    =

    {s|s>S0,Xs=1}

    Ys1ws1

    {s|sS0,Xs=1}

    Ys0ws0, (2)

    where the weights w1s and w0s ensure the surveyed in-flight and pre-campaign individuals

    are representative of the population of exposed individuals. We cannot, however, estimate

    the above equation because we do not observe ws0; that is, the weights are only calculated

    for the individuals surveyed during the campaign. Furthermore, the analysis and discussion

    in section 3 indicate the pre-campaign group is systematically different from the in-flight

    group for which we observe the weights. To prune the non-representative pre-campaign

    respondents, we employ a matching procedure that restricts the analysis to each in-flight

    survey and its nearest-neighbor from the pre-campaign group.

    4.1 The Matching Estimator

    We match each in-flight respondents

    surveyed during the campaign with a setM

    s of pre-survey respondents based on a set of variables Zs that we describe below. Then we estimate

    the following:

    =

    {s|s>S0,Xs=1}

    mMs

    (Y1s Y0m)w1s

    |Ms|. (3)

    3Our notation for s equates surveys and respondents. Given the sampling approach described in section3, a given respondent could potentially fill out multiple surveys in the repeated cross section. We currentlycannot separate such cases to treat them specially.

    20

  • 7/28/2019 Internet vs TV 2

    21/33

    In the above expression, Ms is the set of pre-campaign respondents that are matched to

    in-flight respondent s, and Y0m indicates whether the mth matched pre-survey respondent

    correctly recalled the brand. We divide by the number of matched respondents, |Ms|, such

    that the total weight for each in-flight respondent s is equal to that respondents reported

    weight, w1s.

    The assumption underlying this estimator is

    E[Y0|s > S0, Z] = E[Y0|s S0, Z] . (4)

    In words, we assume that conditional on the matching variables, Zs, the expected response

    to the pre-campaign survey is invariant to whether the individual was surveyed before or

    after the campaign began. The assumption therefore guarantees our estimator removes any

    systematic sampling differences between the pre-campaign and in-flight groups.

    4.2 Matching Variables

    Given our goal is to compare advertising effectiveness across various media formats, we

    decided to focus on media consumption as the most relevant descriptor of the surveyed

    individuals. The matching variables Zs include the total number of campaign exposures for

    each of the three Internet formats, as well as the total number of TV exposures across all

    campaigns in our data.

    The Internet formats provide valuable match variables because they are passively ob-

    served and thus do not suffer from self-reporting issues. Furthermore, they are highly re-

    flective of the type of individual. Specifically, exposure to the campaigns advertisements

    signifies the individual is in the campaigns target, and the number of exposures provides a

    measure of the intensity of viewership of the targeted medium.

    We do not match on television exposures within the campaign because they are not

    passively observed. A pre-campaign respondent could have been exposed to TV even if we

    do not observe TV exposure. However, we include the total television exposures across all

    campaigns so that we are matching on a measure of television viewership intensity. The

    21

  • 7/28/2019 Internet vs TV 2

    22/33

    number of total television exposures also helps us separate out individuals that might take

    many surveys, because reported television exposures give the respondent the opportunity

    to take more surveys. Such individuals are down-weighted in Nielsens estimate of each

    in-flight respondents weight, but we need to match on this characteristic to ensure similar

    down-weighting of pre-campaign respondents who might have reported many exposures.

    We use a nearest-neighbor matching approach (Abadie, Drukker, Herr & Imbens 2004,

    Abadie & Imbens 2012) in which we find at least one pre-campaign survey to match to each

    in-flight survey. As Abadie & Imbens (2012) show, allowing individual observations to be

    used as a match more than once lowers the bias of the estimates.

    We seek exact matches on the campaigns passively observed Internet exposures and

    allow the overall television exposures to sort among ties in terms of shortest distance. If ties

    are still present, we include all tied matches, which accounts for |Ms| in equation (3) being

    greater than 1 for in-flight survey s. As per equation (3), we include the additional matches

    based on their share of the total matches to s. If no exact match exists, we find the nearest

    neighbor in terms of the distance between the two vectorsZs for the in-flight survey and

    Zs

    for the pre-campaign survey.

    Our procedure worked well. For the TV format question, we are able to match exactly

    96% of the in-flight respondents on the passively observed Internet exposures. Note that we

    exclude any pre-campaign respondents that do not match in-flight respondents, because our

    in-flight respondent weights sum to form the true distribution of exposed individuals. For

    banners the percentage is 84%, followed by video at 75% and rich media at 69%.

    4.3 Causal Interpretation

    Because pre-campaign surveys are conducted well before most of the in-flight surveys (given

    that some campaigns last 4-5 months), time-varying unobservables could make a causal

    interpretation difficult. Moreover, although matching ensures pre-campaign and in-flight

    respondents are comparable in terms of ad exposure and media consumption over the entire

    22

  • 7/28/2019 Internet vs TV 2

    23/33

    time frame of the data, it cannot make up for the time gap between the two surveys. Because

    our goal is to compare exposures across different media formats, our primary concern arises

    from time- varying unobservables that differ based on the media format to which a respondent

    is exposed.

    One source of time-varying unobservables we know exists is unobserved television ex-

    posures. Due to the inability to passively measure television exposures, we only observe a

    subset of the actual exposures to TV ads. However, in trying to assess whether Internet

    formats can build brands comparably to television, unobserved television exposures would

    likely overstate television effects relative to Internet effects. This overstatement is likely to

    occur, because we should expect individuals exposed to television to watch more television

    on average than individuals exposed to the Internet, giving television-exposed individuals

    relatively more unobserved exposures to the ad campaign.

    Other sources of time-varying unobservables include other non-advertising marketing

    activity by the firm or its competitors. For example, in-store displays do not include messages

    that would increase association of the message with a brand but could increase the salienceof the brand in the mind of the customer and therefore increase the focal brands choice in

    random guessing. We have no a-priori reason to believe television- or Internet intensive media

    consumers should see a firms non-advertising marketing activity at a systematically higher

    or lower rate. Competitors are likely to target their marketing activity at the same targets as

    those chosen by the focal brand and these competitive actions could lead to systematically

    higher or lower levels of associations as the time since the pre-campaign survey increases. If

    competitive advertising creates biases in favor of one format over another, we should expect

    these biases to be increasing with time since the pre-survey. We therefore consider our effects

    separately for different progressions of our campaigns, measured as the number of previous

    exposures respondents have to the campaign.

    23

  • 7/28/2019 Internet vs TV 2

    24/33

    5 Findings

    We discuss the results from the above matching procedure in the context of two separate yet

    related research questions. First, by examining the pre-campaign brand recall of the exposed

    population, we can evaluate whether past advertising or brand experiences have led to a

    divergence in brand associations between Internet- and television-intensive targets. We find

    that banner- and rich media- intensive targets have systematically lower levels of brand recall,

    which suggests either past advertising was insufficient or less effective for Internet media.

    Second, the pre-campaign brand-recall measures derived after the matching procedure serve

    as the baseline in our lift measures. The matched pre-campaign sample allows for a more

    accurate measure of the campaign lift, and it ensures that is more comparable across formats

    because it takes into account any pre-existing cross-format differences in brand knowledge.

    5.1 Existing Brand Knowledge across Formats

    As some consumers have shifted their media consumption away from television toward vari-

    ous online formats, a concern arises as to whether brand-building activities can be transferred

    easily across formats. Before we examine the effectiveness of various ad platforms, we con-

    sider the lasting effects of past campaigns. Specifically, we measure pre-campaign brand

    knowledge separately by the media format to which a respondent is eventually exposed

    (and presumably favors). Although our data do not allow us to infer why, for instance, an

    Internet-exposed individual may have had less pre-campaign knowledge of the brand than

    a television-exposed individual, two explanations for the difference in baseline brand knowl-

    edge are possible: (i) brands may have devoted fewer past exposures to the Internet formats

    the individual views, or (ii) past Internet exposures had less persistent effects.

    We are able to assess pre-campaign associations by exposure format because we observe

    pre-campaign respondents eventual exposures to the campaign. Table 10 reports the initial

    conditions based on the matched pre-campaign surveys. These initial conditions differ from

    the ones reported in Table 6 in that they reflect the responses for only those individuals who

    24

  • 7/28/2019 Internet vs TV 2

    25/33

    are exposed to the format-campaign combination and are matched to an in-flight respondent.

    Table 10: Percent of correct brand associations before each campaign in matched pre-campaignsample.

    banner rich media video TVcampaign 1 0.31 (0.46) 0.34 (0.48) 0.32 (0.47) 0.30 (0.46)campaign 2 0.26 (0.44) 0.24 (0.44) 0.35 (0.48) 0.32 (0.47)campaign 3 0.32 (0.47)campaign 4 0.36 (0.48) 0.36 (0.48)campaign 5 0.24 (0.43) 0.08 (0.28)campaign 6 0.24 (0.43) 0.23 (0.42) 0.14 (0.35)

    campaign 7 0.42 (0.49) 0.38 (0.49) 0.50 (0.50) 0.62 (0.49)campaign 8 0.59 (0.49)campaign 9 0.30 (0.46) 0.63 (0.48)campaign 10 0.53 (0.50) 0.45 (0.50) 0.47 (0.50) 0.66 (0.47)campaign 11 0.07 (0.25) 0.05 (0.21) 0.12 (0.32) 0.07 (0.25)campaign 12 0.42 (0.49) 0.42 (0.49)campaign 13 0.11 (0.32) 0.13 (0.34)campaign 14 0.32 (0.47) 0.66 (0.47)campaign 15 0.14 (0.35)campaign 16 0.22 (0.42) 0.41 (0.49)

    campaign 17 0.18 (0.39)campaign 18 0.16 (0.36) 0.43 (0.50) 0.16 (0.37)campaign 19 0.20 (0.40)campaign 20 0.11 (0.32) 0.08 (0.27) 0.16 (0.37)

    The primary change in Table 10 relative to Table 6 is that substantial variation in brand

    recall now exists across formats within a campaign. For example, campaign 7 has a TV

    baseline of 0.62, but the baseline is 0.5 or less for the three Internet formats. Alternatively,

    campaign 2 has a high baseline on video and TV at 0.35 and 0.32, respectively, but is close

    to 0.25 for banners and rich media.

    Although the campaign-by-campaign measures are illustrative, our focus is on the aver-

    ages across campaigns and within format, where the aggregated sample sizes allow us more

    conclusive inference. Table 11 compares average pre-campaign brand recall for each Inter-

    net format to the average pre-campaign brand recall for TV. It also compares the matched

    estimates with the unmatched estimates. For campaigns running banner ads, we see that

    25

  • 7/28/2019 Internet vs TV 2

    26/33

    Table 11: Difference across formats in the percentage correct brand associations in pre-campaignsample. Comparison between matched and unmatched samples. Asterisk denotes a significantdifference at the 5% level.

    unmatched matched exact matchesbanner 0.31 0.28 84%TV 0.33 0.36 97%rich media 0.32 0.26 69%TV 0.35 0.35 96%video 0.30 0.32 75%TV 0.31 0.30 97%

    the initial condition for those exposed to banner ads dropped to 0.28 with matching, which

    is significantly lower than the 0.36 for TV. Rich-media matched initial conditions are also

    significantly lower than TV at 0.26. Video is indistinguishable from TV in both the matched

    and unmatched samples. We suspect video and TV may be similar, because many of the

    video ads were for online viewership of episodes from television series (e.g., through Hulu).

    The banner and rich-media differences from TV are worth considering. The fact that the

    target audience for the online ad campaigns has a lower level of existing brand knowledge

    than the target audience exposed to TV suggests advertisers efforts to reach this population

    have been ineffective thus far. The TV population is more familiar with the brand message

    and is thus better able to correctly link the commercial to the corresponding brand. This

    finding could be the result of insufficient or ineffective past advertising to Internet-intensive

    media viewers.

    5.2 Comparison of Advertising Lift across Formats

    The metric we use to compare the performance of the different advertising formats is the

    campaign lift, calculated as the difference in the brand-recall measure between the matched

    pre-campaign sample and the in-flight sample (see equation (3)). Table 12 reports the lift by

    campaign and format. We observe a dramatic effect across all formats for the new brand in

    campaign 12. Similarly, we find large and significant effects for campaign 21 (banners, rich

    26

  • 7/28/2019 Internet vs TV 2

    27/33

    Table 12: Adjusted lift by campaign and format. Asterisk denotes significance at the 5% level.

    banner rich media video TVcampaign 1 0.10 -0.05 0.07 0.11

    campaign 2 0.13 0.00 0.06 0.05

    campaign 3 0.21

    campaign 4 0.00 -0.01

    campaign 5 -0.09 0.23

    campaign 6 0.20 0.20 0.27

    campaign 7 -0.08 -0.04 -0.05 -0.20

    campaign 8 0.20

    campaign 9 0.54 0.15

    campaign 10 -0.02 0.05 0.33 0.02

    campaign 11 0.30 0.31 0.48 0.43 campaign 12 0.17 0.06

    campaign 13 0.27 0.05

    campaign 14 0.15 0.18

    campaign 15 0.20

    campaign 16 0.38 0.07

    campaign 17 0.37

    campaign 18 0.30 0.12 0.39

    campaign 19 0.33

    campaign 20 0.28 0.41 0.22

    27

  • 7/28/2019 Internet vs TV 2

    28/33

    Table 13: Aggregate lift comparison across formats.

    avg. lift t-value p-value

    banner 0.17 1.56 0.12TV 0.14rich media 0.12 0.38 0.71TV 0.10video 0.19 1.60 0.11TV 0.14

    media, and TV), campaign 19 (banners and TV), campaign 15 (banners and TV), campaign

    7 (banners and TV), and campaign 6 (TV and video). Some campaigns have a much greater

    banner lift than TV (e.g., 9 and 16). Rich media provides the highest lift in campaign

    20. Video outperforms other formats in campaigns 10 and 13. These differences suggest

    more exploration is needed when data become available for a larger number of campaigns

    in order to enable the establishment of a relationship between campaign characteristics and

    the effectiveness of the media vehicles.

    Comparing the average performance of the advertising formats across the campaigns, we

    find all Internet formats perform slightly better than TV, with video having the highest

    relative lift at 0.05, banners at 0.03, and rich media at 0.01. However, the p-values for video

    and banners are only 0.11 and 0.12. Thus, accounting for the differences in pre-existing

    brand knowledge by format leads to a different inference regarding the relative performance

    of TV versus the online advertising formats. By only comparing in-flight recall rates, TV

    appears to be the most impactful medium, but adjusting for the initial conditions, Internet

    formats perform just as well or perhaps even better.

    One question that arises in comparing in-flight lift across campaigns is whether respon-

    dents were exposed the same number of times across campaigns at the time they are surveyed.

    Table 14 reports the average number of exposures to the surveyed format for each Internet

    versus TV comparison. Exposure rates among the banner- and rich-media exposed/surveyed

    28

  • 7/28/2019 Internet vs TV 2

    29/33

    Table 14: Average number of exposures to focal format vs. TV at the time a respondent takes asurvey in focal format.

    exposures t-value p-valuebanner 2.57 4.91 0.00TV 1.86rich media 2.58 3.78 0.00TV 1.86video 2.24 1.40 0.16TV 2.01

    Table 15: Adjusted lift by number of exposures prior to survey for the pairwise comparison toTV.

    banner rich media video1 exposure 0.01 -0.04 0.062 exposures 0.01 0.07 0.083 exposures 0.03 0.03 -0.084 exposures -0.02 -0.03 -0.085 exposures -0.14 -0.18 -0.186 exposures 0.23 0.13 -0.07avg. diff. in lift 0.03 0.01 0.05

    Note: denotes significance at the 10% level, significance at the 5% level.

    are significantly greater than TV exposures for the same campaigns. Recall, however, that

    not all TV exposures are observed. Video exposures are also greater than TV exposures,

    though the difference is not statistically significant.

    Table 15 reports the difference in lift between TV and each Internet format separately by

    the total number of exposures to the campaign. Once we condition on exposures, we do not

    see any format performing systematically better. Even for a given exposure level, only one

    comparison yields a significant result (banner lift 0.23 greater than TV at six exposures).Overall this finding suggests that the ability of Internet exposures to produce lift measures

    comparable to those of TV is not due to systematically greater numbers of exposures to the

    campaign.

    29

  • 7/28/2019 Internet vs TV 2

    30/33

    6 Conclusions

    In this research, we propose a methodology for establishing a format-specific baseline to

    assess the lift in brand recall due to an advertising campaign. We supplement the in-flight

    brand-message surveys with a set of pre-campaign surveys and match the pre-campaign

    respondents to those eventually exposed to the campaign in order to control for pre-existing

    brand knowledge. The rich data set we have, tracking the response to TV and Internet

    advertising for 20 campaigns across a variety of industries, provides us with comparable

    measures to assess the relative performance of the different advertising formats.

    We find a systematically lower level of brand knowledge among individuals who are

    surveyed about banner and rich media. Without a format-specific baseline, a researcher

    might therefore draw the wrong conclusion and ascribe too much importance to TVs effect

    on brand recall. Once the difference in pre-existing knowledge is taken into account, there

    is no significant difference in the effectiveness of TV and Internet ads in terms of correct

    brand identification. This result underscores the importance of pre-campaign surveys and

    our matching methodology for comparing ad performance across media formats.

    The goal of our research was to assess the widely held belief that TV outperforms Internet

    formats as a brand-building platform, and we therefore focused on head-to-head comparisons

    of TV to the Internet formats. Nevertheless, as advertisers decide how to use these various

    formats, knowledge of the complementarities between media will be important. Researchers

    in marketing have long explored the potential synergies in multimedia communications (see,

    e.g., Naik & Raman (2003) or Dijkstra, Buijtels & van Raaij (58) for recent examples)but the empirical study of the phenomenon in a field setting is still challenging. Studies

    that randomly vary TV and Internet pulses across geographic markets may be best suited

    to disentangle the optimal combination and sequencing of ad formats. This more detailed

    analysis was not possible in our context, where most advertising was at the national level, so

    a focus on brands involved in geo-targeted campaigns may be the most promising approach.

    Another avenue for future research would be to investigate more formally the link between

    30

  • 7/28/2019 Internet vs TV 2

    31/33

    category characteristics and effectiveness of different types of campaigns. Our study and the

    existing literature on advertising point to a number of potentially relevant brand and category

    factors such as the maturity level of the category, the stage of the product life cycle (new

    introduction versus established brand), and the amount of previous advertising, possibly as

    share of voice in the category. In addition, the type of consumer decision making in the

    product category - whether it is a low-involvement or a high-involvement process will also

    likely play a role in determining what media format will be most effective.

    Finally, our research can be extended by practitioners to include cost measures in com-

    paring the relative performance across ad formats and guiding the media budget allocation

    decisions. As of now, online advertising still appears to be more cost effective. We anticipate

    though that once the brand-building potential of Internet formats has been firmly estab-

    lished, the prices for online advertising will increase to reflect their relative performance.

    31

  • 7/28/2019 Internet vs TV 2

    32/33

    References

    Abadie, A., Drukker, D., Herr, J. & Imbens, G. (2004). Implementing matching estimators

    for average treatment effects in Stata, The Stata Journal4(3): 290311.

    Abadie, A. & Imbens, G. (2012). Bias-corrected matching estimators of average treatment

    effects, Journal of Business and Economic Statistics29(1): 111.

    Assmus, G., Farley, J. U. & Lehmann, D. (1984). How advertising affects sales: Meta analysis

    of econometric results, Journal of Marketing Research 21(1): 6574.

    Briggs, R. & Hollis, N. (1997). Advertising on the web: Is there response before click-

    through?, Journal of Advertising Research pp. 3345.

    Dijkstra, M., Buijtels, H. & van Raaij, F. (58). Separate and joint effects of medium type

    on consumer responses: A comparison of television, print, and the internet, Journal of

    Business Research2005(3): 377386.

    Dreze, X. & Hussherr, F.-X. (2003). Internet advertising: Is anybody watching?, Journal of

    Interactive Marketing17(4): 823.

    Franzen, G. (1994). Advertising Effectiveness: Findings from empirical research, NTC Pub-

    lications, Henley-on-Thames, U.K.

    Goldfarb, A. & Tucker, C. (2011). Online advertising, Advances in Computers, Vol. 81,

    Elsevier.

    Heckman, J., Ichimura, H. & Todd, P. (1997). Matching as an econometric evaluation

    estimator: Evidence from evaluating a job training programme, Review of Economic

    Studies64: 605654.

    Hutchinson, W. & Moore, D. (1984). Issues surrounding the examination of delay effects of

    advertising, in T. Kinnear (ed.), Advances in consumer research, Vol. 11, Provo, UT:

    Association for Consumer Research, pp. 650655.

    32

  • 7/28/2019 Internet vs TV 2

    33/33

    Imbens, G. (2004). Nonparametric estimation of average treatment effects under exogeneity:

    A review, The Review of Economics and Statistics 86(1): 429.

    Krishnan, S. & Chakravarti, D. (1999). Memory measures for prestesting advertisements:

    An integrative conceptual framework and a diagnostic template, Journal of Consumer

    Psychology8(1): 137.

    Lewis, R. & Reilley, D. (2011). Does retail advertising work?, Technical report, Yahoo!

    Research.

    Lodish, L. M., Abraham, M., Kalmenson, S., Livelsberger, J., Lubetkin, B., Richardson, B.

    & Steve, M. E. (1995a). How advertising works: A meta-analysis of 389 real world split

    cable tv advertising experiments, Journal of Marketing Research 32: 125139.

    Lodish, L. M., Abraham, M., Kalmenson, S., Livelsberger, J., Lubetkin, B., Richardson, B.

    & Steve, M. E. (1995b). A summary of fifty-five in-market experimental estimates of

    the long-term effects of advertising, Marketing Science14(3): G13340.

    Naik, P. & Raman, K. (2003). Understanding the impact of synergy in multimedia commu-

    nications, Journal of MArketing Research40(4): 375388.

    Rossiter, J. & Bellman, S. (2005). Marketing Communications: Theory and Applications,

    Pearson Education.

    Wells, W. (1964). Recognition, recall and rating scales, Journal of Advertising Research

    4(3): 28.