how good is good enough?€¦ · we shared our preliminary work at the 2014 amsrs conference in...

24
Page 1 of 24 How good is good enough? Jayne Van Souwe and David Bednall Synopsis The concept that any information is better than nothing explains why DIY research, straw polls and meta-data analysis of partial data are becoming so popular and the results are being treated as reliable and accurate by the users of them. As former scientific researchers we were concerned that “near enough” is not “good enough”. This paper outlines results from two pieces of original research designed to find out who responds to surveys by contact and response mode and who doesn’t. It sheds new light on our thinking and calls into question the representativeness of even the best-designed and executed surveys. We shared our preliminary work at the 2014 AMSRS Conference in Melbourne 1 . We now have even more evidence from every mode of recruitment and most completion modes. We will give our views on the best uses of different sample frames for different applications from a Census which must be taken and used by government for the purposes of providing services and future planning, down to online panel and street intercept which can be very handy in specific circumstances. Background Having come from scientific backgrounds we have always wanted data to be accurate and robust. When one of the authors was talking with a client about the shortcomings of capturing some information needed to make a decision she said, “Look I have to make this decision. At the moment I have no information so it’s the toss of a coin, ANY information you can give me that helps to make it, however rubbery, will be appreciated” - so much for being accurate to within +/- 2% at the 95% confidence interval! The large number of studies that are now carried out on metadata, river samples and convenience samples demonstrate that there are many users of research for whom data accuracy and robustness are not valued as highly as getting results quickly and/or cheaply. It is interesting to note that the ESOMAR Market Research Handbook starts by defining “marketing intelligence”. The purpose of marketing intelligence is to provide management with the facts, information and insights it needs to rapidly make the best, most efficient business decisions” Fredrik Nauckhoff 1 Bednall, D et al - Access all people: the 3M approach AMSRS Conference, Melbourne 2014

Upload: others

Post on 07-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 1 of 24

How good is good enough?

Jayne Van Souwe and David Bednall

Synopsis

The concept that any information is better than nothing explains why DIY research, straw polls and

meta-data analysis of partial data are becoming so popular and the results are being treated as

reliable and accurate by the users of them. As former scientific researchers we were concerned

that “near enough” is not “good enough”.

This paper outlines results from two pieces of original research designed to find out who responds

to surveys by contact and response mode and who doesn’t. It sheds new light on our thinking and

calls into question the representativeness of even the best-designed and executed surveys.

We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have

even more evidence from every mode of recruitment and most completion modes.

We will give our views on the best uses of different sample frames for different applications from a

Census which must be taken and used by government for the purposes of providing services and

future planning, down to online panel and street intercept which can be very handy in specific

circumstances.

Background

Having come from scientific backgrounds we have always wanted data to be accurate and robust.

When one of the authors was talking with a client about the shortcomings of capturing some

information needed to make a decision she said, “Look I have to make this decision. At the

moment I have no information so it’s the toss of a coin, ANY information you can give me that helps

to make it, however rubbery, will be appreciated” - so much for being accurate to within +/- 2% at

the 95% confidence interval!

The large number of studies that are now carried out on metadata, river samples and convenience

samples demonstrate that there are many users of research for whom data accuracy and

robustness are not valued as highly as getting results quickly and/or cheaply.

It is interesting to note that the ESOMAR Market Research Handbook starts by defining “marketing

intelligence”.

“The purpose of marketing intelligence is to provide management with the facts, information and

insights it needs to rapidly make the best, most efficient business decisions” – Fredrik Nauckhoff

1 Bednall, D et al - Access all people: the 3M approach – AMSRS Conference, Melbourne 2014

Page 2: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 2 of 24

This is depicted in the following wordle:

The definition of market research as stated in the AMSRS Code of Professional Behaviour is:

“The systematic gathering and interpretation of information about individuals or organisations using

the statistical and analytical methods and techniques of the applied social sciences to gain insight

or support decision making. This differs from other forms of information gathering in that the

identity of participants will not be revealed to the user of the information without explicit consent and

no sales approach will be made to them as a direct result of their having provided information.”

The first thing to note is that the definition of marketing intelligence includes value judgements such

as “best”, “most efficient” and it includes an adverb “rapidly”. The short and punchy definition of

marketing intelligence sounds fast paced and there is a sense of movement.

Market research sounds altogether ponderous in comparison!

Page 3: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 3 of 24

We market researchers are pragmatists, and we know that our endeavours only form a part of the

broader intelligence framework. We also know that the days of the dedicated market research

buyer have gone and that research is now purchased by people with broader titles like Insights

Managers, Knowledge Gatherers and so on, as well as the more traditional product or service

marketing and communications personnel.

We face the issue of remaining relevant if we’re plodding along systematically while the world is

making decisions on the fly, however that isn’t the topic of this paper. This paper concerns itself

with the nature of the information that we do provide on which decisions will be made, and the

accuracy and precision that are necessary in order to be “good enough”.

Sampling – the mainstay of market research – which sample is good enough?

If we’re going to be systematic in gathering information of any type, we must first define the people

we need to talk with and then work out where to find them. It’s obviously easier to talk with people

in a known population, for example, customers of an organisation who use a particular product or

service and whose contact details are known to the organisation. It’s harder when we want to talk

to people and we don’t yet know exactly who they are, as would be the case in new product

development or where we really need to speak to everyone in order to be sure that we have a clear

idea of their views.

Having defined who we need to speak with, the next issue is the bane of all researchers’ lives, but

especially junior researchers, that is, to source a contact list or suitable sample of them.

When we are trying to get a fix on a population we generally take a random sample of that

population. This is especially important if we don’t understand its characteristics.

Simple random sampling "is a probability sampling procedure that ensures that every sampling unit

making up the target population has a known, equal, non-zero chance of being selected." (Hair &

Lukas, 2014, p.252). As the most fundamental probability sampling method, we use it as the basis

for projecting our sample results to the population with a known degree of confidence.

A year ago, we compared respondents from a reputable online panel, with telephone (fixed and

mobile respondents) and gave everyone the chance to complete the survey by phone or online

(Bednall et al, 2014). In fact we also used mail, but timing conspired against us and the response

was so low as to make the results unusable. The matched samples in that study attempted to gain

co-operation from exactly the same type of respondents demographically from each of fixed line

telephone, mobile and online panels. We achieved these for mobile and online panels but in the

interests of good practice did not fill all quotas with younger people via fixed line phone because of

the enormous number of phone calls we were making to do this. We could not justify contacting

several hundred members of the public in order to find one in-scope respondent.

Page 4: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 4 of 24

We found that people who responded in the different modes had different attitudes and behaviours

to each other even if they seemed to be the same demographically.

In our most recent study, we added face-to-face interviewing into the mix in order to contact young

people and were very surprised at the excellent response rate achieved by this method. All

respondents were invited to participate in an online survey and were recruited face to face or by

dual frame (fixed line and mobile) telephone interviewing. People who did not want to complete the

survey online (or were recalcitrant) were allowed to complete the survey on the spot, if recruited

face-to-face or by telephone, if recruited by that means. Responses to questions relating to method

of contact are shown in Table 1.

Table 1: Methods of accessing the general public

Base: All respondents (Excludes Don't Knows)

QD2. Which of the following do you have access to?

Source: Wallis Multi Frame Multi Mode Omnibus Dec 2014

What is immediately clear is that amongst the individual means of making contact, letter boxes have

the highest penetration into the community followed by home internet. In aggregate, it is possible to

contact the entire population by phone (fixed or mobile), and most of the population by mail and the

internet. These figures, though obtained from a multi-mode survey are very consistent with ACMA

data which is sourced from another comprehensive, survey2.

So what’s the problem, we can access everyone can’t we? Let’s leave the practical matters of

access in the various contact modes aside for the moment and think about the fact that everyone

can be contacted somehow.

2 ACMA – Communications Report 2013-2014 see http://www.acma.gov.au/theACMA/Library/Corporate-library/Corporate-

publications/communications-report

TOTAL 18-29 30-49 50+

(n=643) (n=222) (n=148) (n=273)

% % % %

A phone (net) 99 99 99 100

A fixed phone (land line) in your home 74 44 68 96

A mobile phone (net) 94 99 98 89

A smartphone that connects to the internet 68 95 87 41

A mobile phone (not a smartphone) 33 8 20 54

A mailbox (net) 94 87 96 95

A letterbox 91 85 94 94

A post office box 14 16 15 12

An internet connection (excl. phone) 89 90 95 86

An internet connection at home 84 82 92 81

An internet connection at work or in a public place (e.g. Library) 56 61 78 43

A tablet that connects to the internet (by Wi-Fi / 3G / 4G / LTE etc.) 46 46 61 38

AGE

Page 5: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 5 of 24

Research has shown that response rates are low and dropping in many modes (Bednall et al,

2013), so even when we can contact someone by a particular method, we have to worry about non-

response – that is, are the people who don’t respond the same or different from those who do?

Knowing that people generally respond in the mode they’re contacted in and looking at Table 1

suggests that different people respond in different modes as well.

There is also a matter of preference. Just because people have a means of contact, does not

mean that they will respond in it. Our research both last year and this showed people don’t expect

organisations that they don’t know to contact them by some methods as shown in Chart A.

Chart A: Expected means of contact from organisations and people you know

Base: 859

Q1: How do you prefer people you know to contact you?

Q2: Overall, what is the main way that you prefer organisations to contact you?

Source: Deakin/Wallis Multi Mode Multi frame Omnibus, 2013

People expect communications by mail (electronic or hardcopy) from organisations they don’t know

as well as from people they know. However, when answering their mobile device or landline phone

or viewing an SMS, they are largely expecting it to be a communication from someone they know.

This should be particularly the case for those whose landline is on the “Do Not Call Register”,

though the exemption for charities has meant a large proportion of calls are likely to come from

these sources. Call screening is a likely response, reinforcing the point that people prefer to use

this medium for personal, not business, communication (see Chart E).

Chart B shows these data broken down further by age group. This chart shows the proportion of

people in different age groups who are accessible by a medium and who expect to be contacted by

organisations in this medium.

Page 6: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 6 of 24

Chart B: Proportion of people by age group who are accessible and expect

organisations to contact them in this medium

Base: 859

Q1: Which of the following do you have access to?

Q2: Overall, what is the main way that you prefer organisations to contact you?

Source: Deakin/Wallis Multi Mode Multi frame Omnibus, 2013

People of all ages expect organisations to contact them by e-mail above any other means. On the

surface this is good news for people launching surveys online. However, as we know too well,

people are not logical and fail to behave in the way they believe they will.

Further, while organisations with e-mail addresses of their customers or targets can, and do, survey

their populations by this means, there are no good lists available for the whole population. Even

when using those lists that do exist, researchers must take great care in their approach that they

are complying with the relevant legislation, not least of which is the SPAM Act (2003).

People’s preference for the online medium has undoubtedly fuelled the growth and use of online

panels as fast, efficient and, hopefully, pleasant means of reaching Australians. Unfortunately,

while access to the internet is now practically universal in Australia, our previous study estimated

that only about one in five Australians had ever registered for an online panel and about one in six

is still on one – fewer again are active. We also demonstrated that people on panels are invited to

complete surveys at very much greater frequency than by other modes and to complete many more

surveys. Nonetheless, the pool of willing respondents in total is quite large.

Page 7: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 7 of 24

Chart C Estimated access to the Australian public by all major electronic modes

Source: Deakin/Wallis Multi Mode Multi Frame Omnibus, 2013

There is known to be considerable overlap between panellists – that is, panellists tend to be on

more than one panel. Chart D shows the age profiles of people on the major panels – which are

disguised. It also shows the extent of overlap between the largest eight panels operating at the end

of 2013.

Chart D is interesting in that it shows quite clearly why some online panels are becoming

increasingly protective of younger respondents – there are not relatively more of them in

comparison to other age groups – and the over 60 age group is the largest single age group.

Page 8: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 8 of 24

Chart D: Age breakdown of panel members

Base: Respondents on a panel (398)

Q5: Which panel(s) do you belong to?

Source: Deakin/Wallis Multi-Frame Multi-Mode Omnibus, 2013.

Finding young people who are willing to participate in survey research has always been a challenge

and it remains so with panels as well.

It is particularly important in the context of Road Safety to interview this group since it remains

overrepresented in road accidents. The Transport Accident Commission in Victoria has the dual

mandates of reducing road trauma on Victoria’s roads whilst providing financial support to those

who are injured (or killed) on them. To this end, it is important for the TAC to understand the actual

prevalence of attitudes, beliefs and behaviours so that it can deploy its resources appropriately.

Being a public body, the evidence upon which it acts must be transparent and credible.

The TAC has kindly given permission for data from two of its flagship surveys to be shared to

demonstrate how people behave in practice. Some of this data is published, but other data is as

yet unpublished.

The annual Road Safety Monitor (RSM) and the ongoing Public Education Evaluation Programme

(PEEP) are considered by the Victorian government to be of sufficient importance that the TAC has

been given access to the complete database of licensed drivers in order to conduct them. Both

studies recruit participants initially via a letter of invitation to randomly selected Victorian license

holders. The RSM includes a self-completion questionnaire as well as giving people who do not

respond in that mode the ability to go online and complete it, or to wait for a telephone call and

complete it that way. This survey runs periodically with sufficient time allowed for data capture to

enable multiple follow ups to increase the proportion of the initial sample to participate (TAC, 2014).

Page 9: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 9 of 24

PEEP similarly invites people to participate via a letter, but because it is time sensitive, it asks

Victorian motorists to go online and complete the survey, or wait for a phone call. Interviewing must

be completed within a week.

Address information is accurate, but not all addresses have telephone numbers and both studies

use the Sensis telephone matching service to find numbers where they do not exist.

Table 2: Response rate to TAC RSM and PEEP by age and mode

Source: Transport Accident Commission, by kind permission.

The response rate amongst the youngest age group is the lowest in both studies. We have not

shown the rates since they are not directly comparable. It shows that young people will go online of

their own volition, but both studies enjoy success when they are phoned. PEEP does particularly

well in this medium because of the timeframe and the fact that the survey is considerably shorter

than the RSM. These data de-bunk the idea that young people have a preference (and perhaps a

greater willingness) to complete surveys online in practice (or any mode, in reality).

The information presented so far has demonstrated that there is no, one single sample frame that

gives equal access to all people.

Is the Dual Frame telephone sample the answer?

Dual frame samples or mobile only surveys appear to give the highest penetration into the

community and now approximate the “good old days” when everyone had a fixed line phone and

most were listed in the telephone directory. In practice this is not the case.

Just because people have a phone, does not mean that they will answer it. Our research both last

year and this showed that many people use blocking tactics to screen contacts from people they

don’t know.

As the left hand pie in Chart E shows, the slim majority of mobile phone owners try to answer calls

immediately, but the remainder has other strategies. The people who try to answer immediately

also have compelling reasons for doing so – a high proportion of them are tradespeople and small

businesspeople who clearly need to answer their phones to generate or operate their business.

Hard Copy Online Phone Online Phone

% % % % %

18 - 25 33 48 19 32 68

26 - 39 37 50 13 41 59

40 - 59 48 36 10 39 61

60+ 78 18 4 34 66

RSM

(n=928)

PEEP

(n=5,176)

Page 10: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 10 of 24

Chart E: Actions taken by people when their mobile phone rings

Base: All respondents with mobile phone or fixed line phone

QD2b / Q1f: When your mobile / landline rings what do you usually do?

Source: Wallis Multi Frame Multi Mode Omnibus Dec 2014, Deakin / Wallis Multi Frame Multi Mode Omnibus 2013

The pie chart on the right hand side demonstrates graphically what happens with fixed line

telephones. Again, while the majority of people try to answer the phone immediately, a third of

people use screening tactics.

Taken together this means that many people will not answer a phone call if they do not recognise

the number calling them or if the organisation or person calling them does not leave a compelling

message. Once again, it is far from guaranteed that everyone does have an equal chance of being

included in this frame, given a large proportion of the public disqualifies itself from answering

unsolicited phone calls.

There are other problems with incorporating mobile phones:

Ethical - it is imperative to make sure that any respondent is safe physically and in other

ways to answer any call, but with mobiles the challenges are greater. We don’t know the

location of the phone we’re calling from most samples that are available, meaning we must

be careful not to call people outside the hours permitted by law.

Respondent Goodwill - our industry has done itself no favours with the public in foisting

long, often boring and often irrelevant studies on it. In mobile mode it is incumbent on us

to do the right thing: keep to interviewing length guidelines and make the experience

Page 11: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 11 of 24

good. As we have seen, the public is not expecting to be called via this medium to

participate in market research.

Dual frame interviewing is certainly an advance on many of the other frames available and is

becoming more widely used in Australia and overseas.

Should we go back in the future?

Referring back to Table 1 shows that virtually all Australians have access to a mail box. We have

shown that this technique can be very useful – particularly if the survey is short or simple or

pleasant or extremely relevant to the individual, and to make initial contact. It can be very effective

on its own where the contact details of the population of interest are available, such as a customer

listing, It does not work so well if you’re simply writing “to the householder” with a long and tedious

questionnaire on something that is not a high priority for the recipient. Nonetheless, there are many

personalised mailing lists available, and for some surveys it remains a viable technique. However,

its known flaws of having low response rates and being slow, quite apart from recent cost increases

in mailing, paper and administration, have served to relegate this as a mainstream technique.

Face to face interviewing is an old technique which is enjoying something of a revival. We used it

to excellent effect to interview young people in our latest multi-mode, multi-frame omnibus.

Interviewers can be located where respondents are likely to congregate making it an efficient

means of finding respondents of certain types. Tablet and notebook computers have replaced pen

and paper, bringing the benefits of computerised scripts to the streets and people, particularly

young people, are so taken aback at being approached that a surprising number agree to an

interview. Nonetheless, the costs involved in using only face to face interviewing as a means of

gaining information from the entire Australian public mean that this is not viable in most cases. Like

self-completion mailed surveys, it is a useful addition to the arsenal rather than a magic pudding.

The future – Take me to the river?

New technology allows access to a wide array of electronic communications that can be fished,

skimmed or accessed. DIY survey packages encourage researchers to answer their questions in a

variety of ways. The more reputable operators mention the need for good sampling practices, and

some supply respondents to researchers who have no suitable sample available to them. Some

encourage putting survey links in places where they are likely to be seen by potential respondents –

or river sampling.

River sampling is a tricky business. Just as a fast flowing river can fill a receptacle placed into it

very quickly, so can a well-placed link to a survey be filled quickly by responses. The issue is that

the type of respondent will differ depending on where the survey link is placed – and there is

generally little way of ensuring that people responding have an equal chance of being included in

the first place. The name river sampling is apt – any hydrologist will tell you that the composition of

the river is highly variable and what you capture in your net or bucket will vary greatly depending on

which part of the river it is placed in. The same is true in survey research. As with other means of

Page 12: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 12 of 24

accessing the public, using river sampling as a means of finding a specific group of people can be

valid, but as a proxy for the entire population it is clearly skewed towards the stream that it is placed

in. This may well be good enough to give a sense of general sentiment (and it is usually bad

sentiment that permeates the ether), but as a basis of systematic research, it is limited.

Using pop-ups and placements within advertorials to capture the views of in scope respondents can

also be problematic. Our most recent piece of research has shown that a high proportion of people

using the internet employ a range of strategies to block what they see. Over a quarter of

Australians, for example, block ads and pop ups regularly and this rises to four in ten people aged

under 30.

The current gold standard

Multi-frame interviewing offers the best way to access all members of the Australian public. One of

the key challenges for surveys that use multiple sampling frames is the possibility that an individual

might be in-scope in more than one of the sampling frames. This means that all people in the

starting sample do not have an equal probability of being selected and causes problems for

statisticians both in working out how accurate estimates are and how to weight the data to correct

for sampling biases (Ansolabehere & Shaffner, 2014; Hu et al, 2014; Maia, 2011; Pfefferman &

Rao, 2009,).

Much work has gone into appropriate ways of weighting the data using a range of ways to assess

the probability of contact and normalise for it (Barr et al, 2014; Berzofsky et al, 2009; Callegaro et

al, 2011; Levrakis, 2013; Lohr, 2000 – 2011; Ridenhour et al, 2013; Yeager et al, 2011).

In both of our multi-mode surveys we found that the closest approximation to ABS published

statistics (and other published data that we had captured as a sanity check) was achieved simply by

adding data from the different sample frames together. The reason for this seems to be that while

people may have many means of being contacted, they have a mode preference both for contact

and completion. Talking with respondents in our latest survey also suggested that most people

would only be available and willing to answer in any one mode.

Having said this, we’re not suggesting that the answer is to use so many different starting samples

that it is impossible for members of the public to escape the net, or that we don’t take the possibility

of overlap seriously. We are continuing our work in the area of best practice in ways of handling

multi-frame (and multi-mode) data for community based surveys and commend readers of this

paper to the references we’ve provided for those attempting what is a still a new and very complex

means of surveying. However, in our view, multi-framing is the best means currently available to

give thorough access to the entire Australian public.

The following Chart demonstrates our best estimate of the way in which the population can be

contacted by mode:

Page 13: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 13 of 24

Chart F: Means of contacting the Australian Public in Practice by Mode

Source: Deakin/ Wallis Multi Frame Multi Mode Omnibus 2013/2014

How many people should we speak with to be “good enough?”

We’ve demonstrated that there is no one sample or mode of interviewing that gives access to the

entire public. However, clearly this is not always necessary.

For specific audiences, different samples work well – for example, it is possible to gain the co-

operation of older females through fixed line telephones, and tradespeople by calling mobile phone

numbers during the working day. Of course, where research is to be conducted with a known

population, listings of customers, service users or potential customers give the best starting sample

of all, although we suggest that to gain the opinions of the widest range of people it is necessary to

have multiple contact points – address, phone numbers and e-mail address.

In the context of making estimates of the wider population though, we return to statistics 101 and

pose the question, “how many people do we need to speak with to give reliable estimates?” In a

true random sample the answer is usually 300 (because this gives estimates accurate to ±4-6% at

the 95% confidence interval) or if you really want to analyse sub-groups within this population,

1,500 because then overall error is reduced to ±2-3% at the 95% confidence interval and you can

analyse up to five evenly sized sub-groups with reasonable precision. Or can we?

Page 14: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 14 of 24

The more I find out the less I know?

Where multiple sample frames are used, there is highly likely to be considerable overlap as most

people will appear in more than one frame. Thus our ideal simple random sampling paradigm

cannot apply as people will not have an equal chance of being selected. This raises two questions.

Firstly, as we build multi-frame samples, how do we determine accuracy? Secondly, if the sample

frame is skewed, will interviewing more people give a more accurate result or support more detailed

analysis?

To assist with both, it is clearly important to have a means of sense-checking any information, so

that surveys should be designed to gather appropriate information. Unfortunately, as we’ve seen

with most starting samples that do not have total coverage (and none really do in terms of likely

response) this cannot be information such as demographic data. It is possible to make four people,

if chosen carefully, represent the whole of Australia – but no-one is seriously thinking that four

people can be relied upon to give a robust result. However, they may be relied upon to act as

expert witnesses.

Table 3 shows a comparison of answers to the question, “What are young Australians’ favourite

foods?”

Table 3 Australia’s favourite foods amongst people aged 18 – 29

The results are remarkably similar. Two of these (Jamie Oliver and Matt Preston) are based on

personal opinions formulated over many years of visiting and living in Australia as well as their

views on popular culture here. The Herald Sun Poll is keen to tell readers the number of

respondents to the survey as a means of demonstrating statistical rigour.

Herald Sun Poll - May 2015

Jamie Oliver Matt Preston (PureProfile)

N=1 N=1 N=1,000

Pavlova Seafood platter Vegemite

Fish and chips Burger Meat pie

Potato cakes Mixed grill Pavlova

Dims sims Fish and chips Steak

Steak Pumpkin soup Macadamia nuts

Sausage rolls Salads Lamingtons

Burgers Avo and vegemite on toast Kangaroo

Lamingtons Sticky date pudding Chiko Rolls

Pie Pavlova Dagwood Dogs

BBQ Shrimp Tim Tam slam Iced Vovos

Page 15: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 15 of 24

Ever since the Oracle at Delphi the opinions of experts and elder statesmen have been used to

provide guidance. The Delphi technique is still very much used and is a useful tool – but it depends

upon drawing on past knowledge and wisdom.

While the past is the best predictor of the future, “think tanks” don’t always get it right and nor do the

experts. They will be closer where the answers are generally known. It’s unlikely that the results

presented in Table 3 were a surprise to the reader. However, right or wrong, the data is unlikely to

be of great importance to anyone except dieticians or anyone wanting to set up a fast food chain.

Happily, according to the ABS3, what we like and what we eat are very different.

However, on the opposite side we see how very wrong statistically “robust” opinion polls can be on

such matters as voting intentions, as was witnessed in the UK earlier this year. The polls were so

wrong that the UK has initiated a public enquiry, with results expected by the middle of 2016.

Something was clearly amiss with the sampling, the mode, the invitation to participate or perhaps

the population really had no idea what it was going to do at the time of asking and was too polite to

refuse to answer.

Random Sampling or Sampling at Random?

We have demonstrated that most sampling frames available for conducting community surveys do

not give each person in the frame an equal chance of participation, yet the statistics we apply to

determine their accuracy rely on this. The problem is compounded across frames (Lohr, 2011).

Weighting may help if we have a reasonable method for estimating frame overlap (Barr et alia,

2014), but that is not always available.

In this brave new world of non-probability sampling, we believe it is time to consider alternative

ways of stating how accurate information is.

Does it really matter whether we state that data are accurate to within +/- x% at a particular

confidence interval, or is it just as useful to give some other means of determining the reliability or

variability of the data? Where there is some existing data available it is possible to use Bayesian

Estimation to give credibility intervals and this is growing in popularity (Roshwalb et al, 2012).

However, this is not possible where there is no existing data to use as the basis for estimation.

In this case, two possible techniques are Boot Strapping and Jack knifing which were originally

suggested in 1958 as a means of comparing results from very small sample sizes (Tukey et al,

1958, Efron, 1982). These techniques work by looking within the data at variability. These

techniques used to be arduous to apply, but now most statistical packages have them built in and

they are enjoying something of a resurgence as a result (Lohr, 2010), Here is an example of how it

can be applied using data on eye colour from the 2013 multi-mode multi frame study. People were

3 ABS Cat No: 4364.0.55.007 - Australian Health Survey: Nutrition First Results - Foods and Nutrients, 2011-12

Page 16: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 16 of 24

asked to describe their eye colour. The table excludes those people who would or could not say.

The figures relate to the percentage of the whole sample that gave that response.

Table 4 Bootstrapped estimates of eye colour

Base: 859

Qi1: What colour are your eyes?

Surprisingly, we can find no Australian data, so we can say ours are the definitive estimates! Some

US estimates are provided for interest4. The bootstrapping procedure takes account of the gender

and age strata in our samples. Because it uses a resampling with replacement technique, it models

a range of possibilities, including possible samples drawn largely from a single mode. It shows our

best estimates – from simply combining the samples, to our bootstrapped 95% confidence intervals.

Moving to this approach allows us to talk about ranges within which results should be considered

“accurate” rather than putting a range around a number per se. This can work in the same way in

practice, but requires a mind shift away from displaying results as statements of fact or absolute

truths towards results that tell a story with a general margin of reliability.

Interpretation is the key?

Based on the statistically sound Ipsos MORI Poll the following headline appeared in the UK

recently:

Today’s key fact: you are probably wrong about almost everything

Most people around the world are pretty bad when it comes to knowing the numbers behind the

news. But how issues such as immigration are perceived can shape political opinion and

promote misconceptions.

4 see http://brandongaille.com/eye-color-percentages-and-statistics/

Colour Fixed Mobile Panel Total Lower Upper

Brown 29.9% 35.8% 33.2% 33.2% 30.3% 36.1% 41.0% Brown

Blue 32.4% 28.7% 33.5% 31.5% 28.6% 34.5% 47.0% Blue/Grey

Green 14.3% 13.2% 11.6% 12.9% 10.7% 15.1% 12.0% Green

Hazel 17.6% 12.5% 13.5% 14.3% 12.1% 16.6%

Grey 0.8% 2.7% 4.7% 2.9% 1.9% 4.1%

Other 1.6% 4.1% 2.2% 2.7% 1.6% 3.8%

Bootstrapped 95%

Confidence Interval Data from US

Page 17: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 17 of 24

A number of questions were asked of people around the World. Chart G shows the actual

proportion of migrants to the country in question in the burgundy bar and the estimated number at

the end of the bar. The difference, shown in orange, is the difference between the actual and

estimated figures. The figures represented in this Chart are almost certainly accurate - however

accuracy is measured.

Australia has the highest proportion of migrants in its population and respondents made the most

accurate estimate of this proportion. Canada has a lower actual proportion of migrants yet

estimated the proportion to be similar to Australians. The US, Italy, Belgium and France all made

estimates around 30% - but the reality is quite different. In every country, the estimate is higher

than the actual proportion.

Chart G Difference between actual and estimated migration rates

Q: Out of 100 people how many do you think are migrants to this country?

Source: Ipsos MORI

Page 18: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 18 of 24

Bobby Duffy, Managing Director of Ipsos Mori Social Research Institute said about this data:

“The real peril of these misperceptions is how politicians and policymakers react. Do they try to

challenge people and correct their view of reality or do they take them as a signal of concern, the

result of a more emotional reaction and design policy around them?

Clearly the ideal is to do a bit of both – politicians shouldn’t misread these misperceptions as people

simply needing to be re-educated and then their views will change – but they also need to avoid

policy responses that just reinforce unfounded fears.”

As Bobby rightly points out, it is how the data are used that is the issue, not the facts themselves.

So size doesn’t really matter as much as careful design and reporting. The actual numbers in this

example are less important than the fact that there are gaps at all and what underpins them.

Which brings us back to PURPOSE!

So how good is good enough?

Why is the information needed?

How will it be used?

These are the two most important questions we need to ask ourselves as researchers and indeed

this is absolutely common to both marketing intelligence and market and social research.

They can also guide us in deciding how to design studies fit for purpose.

Clearly at the top end are data gathering exercises that lead to inscrutable results. The most

obvious examples are various surveys conducted by the Australian Bureau of Statistics with the

mother of them all being the Census of Population and Housing.

A census is a very expensive exercise. The 2011 census is reported to have cost about $440

million. However, if we didn’t have it and just one new major piece of infrastructure was built in the

wrong place as a result, it would cover the savings! On a household basis this is just under $60 per

household to give accurate information for governments at all levels and business to use in their

planning – and for market and social researchers to use as a sanity check or weighting base!

Could it be less accurate? Australia is acknowledged as having an extremely high quality of

information and to justify this on the basis of a rapidly growing population with rapidly changing

needs. Earlier this year, when the government floated the idea of scrapping the census, there was

a large amount of negative comment. These plans appear to have been shelved.

Page 19: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 19 of 24

The ABS has grasped the need to get everyone involved and is able, with force of law, to compel

households to respond. Nonetheless, households are not the only sampling frame used - it goes

beyond these to capture homeless people. The census can also be completed in hard copy format

or online. It is, therefore a multi frame, multi-mode study.

Next in order of importance are studies where it is imperative to make absolute estimates of the

prevalence of behaviour, attitudes or beliefs. Studies such as those mentioned earlier for the TAC

fall into this category as do studies on the state of public health and actual rates of crime. Most of

these studies are now carried out in Australia by census like approaches or where sampling is

involved, by the use of multiple starting sample frames.

At the other extreme, when making a small extension to a product line it may be less expensive to

use an experimental design and simply place the product with some potential users or in store,

rather than going to the expense of a full blown survey to gauge uptake. For example one of the

authors was asked the following question some years ago while working as a Market Research

Manager for a major confectionery manufacturer…

Should Freddo frog wear a bow tie?

The use of market research funds to answer such a question was ridiculous. All that needs to be

done is to watch a number of children eat a Freddo. The answer is simple, children (and some

adults) eat Freddo head first or feet first – most would have no idea what he wears (or if he wears

anything). A more relevant question is whether the cost of changing the mould to make Freddo

with a bow tie could be justified. Clearly it could not be.

The prevalence and ease with which many surveys can be completed has led some organisations

to collect data, sometimes enormous amounts of it, because they can. More is not better if the

information is not captured appropriately, or systematically.

Collecting information quickly is not necessarily a good thing. For example, in assessing

community attitudes towards such things as new developments in local areas, the most vocal

community members are those who are strongly opposed or strongly in support of the development.

If surveys deployed by any means are run overnight or very quickly, they will pick up answers from

the people who have vested interests, not the balanced view that comes out in time, when everyone

is given a chance to participate and give their opinions.

So it comes down to the risk involved in the decision to be made. In general terms, the bigger the

risk (and that is not only financial) that may be involved in the decision, the more accurate a study

should be. We’re defining accurate as:

“The ability to match the population of interest as closely as possible with those questioned and to

ask sufficient people as to make their answers reproducible”

Page 20: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 20 of 24

Obviously, the more that is known about the characteristics of the population the easier it is to

match them. Conversely the less is known about that population, the harder this is to do and the

more compelling an argument to use all methods at the researcher’s disposal.

In our view we should always strive for the best and in doing this recognise that “good enough” in

the context of survey research is a function of:

1. How much we know already

2. How risky the decision is that needs to be made and therefore how much tolerance we have for

considering possible decisions on the basis of confidence intervals rather than a single estimate

3. How quickly we need to make the decision

4. How much money we are prepared to spend to make the decision

There is nothing new in this and researchers have been trading these factors off officially in

Australia for 60 years. What has changed is that we generally have some information available to

us as a starting point, the speed with which decisions are being made is increasing and the relative

size of the budget available is reducing.

Conclusions and Recommendations

In this paper we have attempted to point out the shortcomings in all survey techniques but also to

offer some solutions.

The information presented here causes us to conclude that:

1. No one sample frame gives total coverage of the Australian community and even those that

appear to do not because of response mode preferences.

We recommend that researchers use multiple sample frames wherever possible when

attempting to make estimates for the entire community and understand the limitations

where this is not possible. We recommend challenging the use of probabilistic statistics in

stating the error margins on single frame or single mode samples and using alternative

means of understanding variability in the data.

2. Samples do not need to be big to be reliable - they should reflect the population of interest as

closely as possible.

Where the characteristics of the population are not known, we highly recommend including

questions that have known answers available from other sources that can be used for the

purposes of cross referencing to gauge the extent of the fit – in other words as a sanity

check.

3. The bigger the decision, the more accurate evidence needs to be

We recommend matching the sample to the size of the decision in both its coverage and

accuracy, using ways to calculate the accuracy of the estimates can be achieved in a

number of ways- we do not need to rely on probability sampling to give an idea of the

reproducibility and variability of the data.

Page 21: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 21 of 24

4. Research based on known users or potential users of a product or services provides the best

starting sample but is still not immune to biases – including those introduced if only a single

mode of completion is allowed.

We recommend that respondents should be allowed to respond in the way or ways that

suit them best.

5. The new gold standard is a single listing of the population of interest together with accurate and

up-to-date contact details for multiple ways of contacting that individual.

In practice we seldom have such lists, so if budget and time permit:

Build in multi-mode contact

Build in multiple frames

If using single frame or single mode, be careful how the findings are reported. In most

cases it is no longer possible to estimate error limits based on random sampling principles,

however we can provide guidance on the level of variability of the data.

6. It is good enough if the research design gives results that are within the risk level for the

decision to be made.

We recommend an open debate between people commissioning and using research to

ensure that the uses of the information and its potential limitations are clearly identified.

We conclude with the following table and diagram, which offer a fast ready-reckoner to the optimum

approach to help researchers to design research that is good enough and fit for purpose every time.

We have this within our power – it is up to all of us to be passionate about good practice - and make

it happen now and into the next 60 years!

Table 4: Samples and their uses and abuses

Page 22: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 22 of 24

Chart H: Guide to How Good is Good Enough?

Page 23: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 23 of 24

References

Ansolabehere S & Schaffner BF (2014) Does Survey Mode Still Matter? Findings from a 2010

Multi-Mode Comparison, Oxford University Press on behalf of the Society for Political Methodology.

Barr ML, Ferguson RA, Hughes PJ & Steel DG (2014). Developing a weighting strategy to include

mobile phone numbers into an ongoing population health survey using overlapping dual-frame

design with limited benchmark information. BMC Medical Research Methodology, 14(102), 1-9.

doi:10.1186/1471-2288-14-102

Bednall D, Van Souwe J, Fine B & Bishop B (2014) Access all people: The 3M Project AMSRS

Conference, Melbourne, Victoria, Australia

Bednall D, Spiers M, Ringer A & Vocino A (2013), Response Rates in Australian Market Research,

Deakin University, School of Management and Marketing, Melbourne, Vic.

Berzofsky M, Williams R & Biemer P (2009) Combining Probability and Non-Probability Sampling

Methods: Model-Aided Sampling and the O*NET Data Collection Program Survey Practice 2 (6)

Callegaro M, Ayhan O, Gabler S, Haeder S & Villar A (2011) Combining landline and mobile

phone samples – A dual frame approach GESIS – Libniz-Institut fur Sozialwissenschaften 13.

Efron B (1982) The Jackknife, the Bootstrap and Other Resampling Plans, Philadelphia:

Society for Industrial and Applied Mathematics

Hair JF & Lukas B (2014) Marketing Research. Fourth Edition. McGraw-Hill Education, Sydney,

Australia.

Hu SS, Balluz L, Battaglia MP & Frankel MR (2011). Improving public health surveillance using a

dual-frame survey of landline and cell phone numbers. American Journal of Epidemiology,

173(6), 703- 711. doi: 10.1093/aje/kwq442

Levrakis PJ (2013) Recent developments in Dual Frame RDD Surveys Presentation to Australian

Market and Social Research Society, Melbourne, Victoria, Australia.

Lohr SL (2011). Alternative Survey Sample Designs: Sampling with Multiple Overlapping Frames.

Survey Methodology, 37, 197-213.

Lohr SL (2010) Sampling: Design and Analysis Second Edition. Arizona State University,

Tempe, Arizona, USA.

Lohr SL (2006) Estimation in Multiple-frame surveys Journal of the American Statistical

Association, 63, 271-280

Page 24: How good is good enough?€¦ · We shared our preliminary work at the 2014 AMSRS Conference in Melbourne1. We now have even more evidence from every mode of recruitment and most

Page 24 of 24

Lohr SL & Rao JNK (2006). Estimation in Multiple-Frame Surveys. Journal of the American

Statistical Association, 101, 1019-1030.

Lohr SL & Rao JNK (2000). Inference in Dual Frame Surveys. Journal of the American Statistical

Association, 95, 271-280.

Maia M (2009) Indirect Sampling in Context of Multiple Frames JSM 1769-1777

Pfeffermann D & Rao CR. (Eds.) (2009) Sample Surveys: Design, Methods and Applications,

Vol. 29A, (pp. 71-88) Elsevier, The Netherlands: North-Holland.

Ridenhour J, Berzofsky M, Couzens G L, Blanton C, Lu B, Sahr TR & Ferketich A (2013) Most

efficient weighting approach in dual frame phone survey with multiple domains of interest AAPOR

Annual Conference¸ Boston, Massachusetts

Roshwalb A, El-Dash N & Young C (2012) Towards the Use of Bayesian Credibility Intervals in

Online Survey Results IPSOS Public Relations

Simon JL (1997). Resampling: the New Statistics. Second Edition Resampling Stats.

Tukey JW (1958) Bias and Confidence in Not Quite Large Samples, Annals of Mathematical

Statistics, 29, 614.

Yeager DS, Krosniak JA, Chang L, Javitz HS, Levendusky MS, Simpser A & Wang R (2011)

Comparing the Accuracy of RDD Telephone surveys and Internet Surveys conducted with

probability and non-probability Samples, Public Opinion Quarterly, 75 (4), 709-747.