Transcript
Page 1: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1634

ASSESSING THE USABILITY OF AN ELECTRONIC ESTABLISHMENT SURVEY INSTRUMENT IN ANATURAL USE CONTEXT

Lelyn D. Saner, Statistical Research Division, andKimberly D. Pressley, Economic Planning and Coordination Division

U.S. Census Bureau, Washington, D.C. 20233

Methods for evaluating the usability of survey instruments vary depending on format and stage of development, but it isoften difficult to determine how users interact with such instruments in their natural environment. Several methods wereused to evaluate usability of one survey instrument. We examined existing sources of feedback to the instrument to seewhether consistent problems were noted. Also, Census Bureau staff interviewed representative users at their workplace toclarify problems that were noted. We found these methods effective for gathering information on how users interact withsurvey instruments and for identifying instrument-based obstacles to effective response completion.

Key Words: User-Centered Design, Interface, Computerized Self-Administered Questionnaire

1.0 INTRODUCTION

New methods for collecting statistical information are emerging to take advantage of rapidly improving informationtechnology (March, 1993; Nichols and Sedivi, 1998). Such methods can potentially result in a reduction of timecosts for those people in establishments who are responsible for government reporting. Although traditional paperreporting will continue to be an option, several electronic methods can speed up the process. However, at the sametime that possibilities such as these become more feasible, new sets of issues arise that must be faced as well.Different types of interactivity will require different methods of navigation through data collection instruments, andthese methods may be unfamiliar to the respondent. Mechanisms for automatic calculation of figures and forinformation verification might increase the completeness and accuracy of the data, but the ways in which thesemechanisms are implemented will need to be clear and straightforward to the respondent. The ability of therespondent to complete the instrument may also be highly dependent on the quality and capability of the availablecomputer hardware that the person is using. As such, if we view electronic establishment surveys as tools designedfor collecting data, we can think of them in terms of their usability, and we can view the respondents of the surveysas the primary users.

In this paper, we outline the concept of usability, we discuss it as a property of a system, and we describe how it isbuilt in to the system throughout the development life-cycle. We then review the usability issues that wereaddressed in the testing of the Annual Survey of Manufactures Computerized Self-Administered Questionnaire(ASM-CSAQ). Finally, we address the testing methods used for the ASM, and suggest that they might be applied toother similar applications at similar stages of development.*

2.0 USABILITY AND USER-CENTERED DESIGN

The system is the mechanism that has been constructed by people to perform tasks specified by the user, such asstoring and organizing information, performing calculations, and so forth. The user interface is a tool that facilitatescommunication between a system and its user (Mayhew, 1999). As noted by Murphy et al. (1999), the user interfaceincludes displays and controls through which the system receives input from the user and translates the input intoinstructions that the system can follow. It also takes the output from the system and translates it into feedback that ismeaningful to the user. This feedback allows the user to make judgements and decisions about how to proceed withthe task. According to Dumas and Redish (1993), “usability means that people who use the product can do soquickly and easily to accomplish their own tasks.” Mayhew (1999) notes that “usability is a measurablecharacteristic of a product user interface that is present to a greater or lesser degree," but a distinction must be madebetween the “functionality” of a system and the “usability” of the user interface. Functionality refers to what thesystem itself can do, while usability refers to the ability of the user to exploit those capabilities without feelingfrustrated.

Usable interfaces are compatible with the user’s working environment, reflecting a workflow and a design conceptthat is familiar to the user. They support the user’s learning style with terminology and illustrations that the user canunderstand and with a consistent presentation of information (Hackos and Redish, 1998). The value of usabilityengineering is that it “provides structured methods for optimizing user interface design” (Mayhew, 1999). Thesooner usability criteria can be defined specifically, the better the interface that is eventually produced, but as

Page 2: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1635

Norman (1988) points out, “unless you actually test a number of units in a realistic environment, you are not likelyto notice the ease or difficulty of use.”

Although surveys can be tested in the lab, the best way of determining how users actually interact with the surveyinstruments when they are responding to them is not always clear cut. Every individual user of an electronic surveycompletes the survey in a unique environment, with a unique task situation and different demands on their attention.Whether it is the workplace or the home, this environment is what we refer to as a "natural use context." Taking atip from Norman (1988) we speculated that environmental factors might affect how people use the ASM-CSAQ andtherefore, we had to determine which methods would be most appropriate for investigating this possibility.

Usability can and should be integrated throughout the actual design process, and although boundaries betweenstages of interface development are often fuzzy, some usability techniques clearly lend themselves better to somestages than others. Early on, the developer should lay out a conceptual model of what the system is supposed to doand identify the characteristics of those who will be using it the most. This is best done before any actual designingtakes place. Knowledge of who the users are involves determining what tasks they want to accomplish, what skillsthey have, how much experience they have doing the tasks, what type of environment they are used to working in,and how they learn new tasks. Often an understanding of the task is based on observation of different people doingit. A task analysis (Kirwan and Ainsworth,1993) both identifies the steps required to reach the task goal and mapsout how they fit together in a sequence. It is important that the developer base the conceptual model of the systemon some concrete information about what needs to be accomplished and by whom. This is a first step in what isknown as “user-centered design.”

In the mid-cycle, as the developer constructs more specific and detailed plans for the interface, user-centered styleguides are consulted. Early prototypes of designs are reviewed by experts in usability engineering so that problemscan be solved before they become expensive. These methods are used iteratively, building on what was foundbefore. In the late cycle of development, people who will use the tool are asked to participate in usability testing.They perform typical tasks while the tester observes any problems they encountered. These can be remedied beforethe instrument is released. Developers can also install user tracking methods to obtain feedback from users whenthey are in their own work environment. Data obtained from these mechanisms can be used as a starting point forre-design. That is the point we were at with the ASM. An existing system that had been used for several years wasdue for redesign. We had several sources of feedback from experienced users and we used them to identify criticalpoints of focus for the evaluation.

3.0 THE ANNUAL SURVEY OF MANUFACTURES

The ASM is completed by companies in the United States that maintain manufacturing facilities, and it is designedto assess changes in manufacturing performance. The data collected contributes to the development of nationaleconomic policies. Each company is expected to report all activities within its establishment, includingmanufacturing, fabricating, processing, and assembling, among others. In 1993, the Census Bureau had anelectronic version of this survey developed that ran in MS-DOS. In 1998, it was upgraded to a graphical windows-based user interface. This survey format, referred to as a Computerized Self-Administered Questionnaire (CSAQ),is a stand-alone, executable program, and several other economic surveys have also been constructed in this format.The program is sent to companies on a 3.5” floppy diskette with an instruction sheet for installing and setting up theprogram on a personal computer Companies may elect to report using either the electronic or the paper format.

The 1998 version of the CSAQ consists of 11 unique screen windows for the survey items themselves, but it issupported by an informative welcome window, multiple windows for help, and several windows for providingfeedback and submitting the data to the Census Bureau. The CSAQ allows the respondent to submit data by savingit to the diskette and returning by mail or by transmitting it directly via a modem. The respondent can also print outa hard copy of the information reported. Since the instrument is installed on their computer, all the information issaved on the hard drive when the survey instrument is shut down. Each CSAQ is customized for the company interms of the subsidiary establishments on record and the previous year’s reported information, which is presented tothe immediate right of each question on the CSAQ. This allows the user to compare the current year’s figures withthose last reported. The usability evaluation of the ASM-CSAQ focused on several general aspects of theinstrument which, along with the usability concerns under consideration, are described in the sections that follow.The methods used for identifying and testing these issues are discussed later.

Page 3: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1636

3.1 General Navigation

Several functions presented at the top of each page of the survey support navigation through the instrument. Onebutton takes the user directly to the initial window of the instrument, which has introductory material and links tothe submission steps. There are other buttons to access help, to open the “Plant Manager” (described below), and totake the user to the Next or Previous pages of the survey. A drop-down menu contains the Census File Numbers(CFN's) for all of the establishments run by that company and allows the user to jump between establishments as theneed arises. Another drop-down menu lists all of the items in the survey, allowing the user to jump across multipleitems to a specific one.

Our primary usability concern was whether the functions of these buttons and menus were clear to the users. Didthe users know that they could jump between item windows and establishments? We were also interested inwhether the window-by-window, sequential response format was an effective design or whether navigation wouldbe clearer to the users if all of the items were on a single, scrollable screen. We refer to the sequential format as"item-based" and the scrollable format as "form-based" and discuss our approach to this later.

3.2. Information Edits

One advantage of an electronic survey is that it can be programmed to cue the user when the information beingreported may contain errors. In the ASM-CSAQ, the method for alerting users to errors as they are detected is bydisplaying one of two icons that the user can click on to get details on the problem. One icon indicates a warning,which will not impede submission of the completed survey. Another icon indicates that a user has inputinformation that is invalid. The user is unable to submit the completed survey until these errors are corrected.When the survey for all units has been completed, the user is presented with a review screen that lists all of theerrors and warnings that remain and provides hyperlinks to the items that the messages are related to. This gives theuser a second chance to check the warnings. At this point all errors must be corrected in order to proceed. Theissue here was whether the users noticed the icons and knew what they meant. Did they know the differencebetween the icons and did they see that they were clickable objects?

3.3. Plant Manager

The Plant Manager is an information management utility that is unique to the ASM-CSAQ. It allows the user to dosome reporting tasks more quickly. Files can be imported into the CSAQ directly. Information can be exported to adata file to keep as a record. Other functions include the ability to search for subsidiary establishments by name,make customized copies of the CSAQ for individual subsidiaries, and merge information from multiple subsidiariesinto one response. The issue here was whether or not users knew what functionality was available to them throughthe Plant Manager, and if so, whether using it would be straightforward.

4.0 METHODS USED TO IDENTIFY USABILITY ISSUES

Our primary interest in this paper is how effective the methods that we used were for our purposes. Determinationof the appropriate methodology is important at all stages of the usability evaluation process, but we focus on twostages specifically; identifying usability issues and testing the interface features for them. If the members of the userpopulation have used the interface, it is beneficial to refer to available information on their interaction with theinterface to determine what the most critical usability concerns are for the redesign.

4.1 Records of Communication

One source of feedback was a set of communications directed from the users of the CSAQ to the Census Bureau.These phone calls and e-mail messages referred to difficulties with using the CSAQ. This kind of feedback isalways helpful, but especially so if the same problem is identified independently by multiple users in differentlocations. Anytime there are multiple mentions of problems, it is possible that there are larger, underlying usabilityissues deserving specific attention. The main problems identified through this method had to do with the PlantManager. Several users had difficulty with the component functions of merging, importing, and dispatching data,and although the set of records used in this case was relatively small, the consistency of their content made them auseful source. We did have some difficulty reading and interpreting some of these messages, since different people

Page 4: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1637

recorded them, so we recommend that all the records follow a consistent format. That is, the organization shouldhave a specific, standard form on which essential information from such correspondence is recorded, and moreimportantly, all staff who might be contacted ought to be told how to fill out this form.

4.2 Feedback Questionnaire Responses

Another way to learn the reactions of the users is through the use of rating scales for various aspects of the userinterface, such as screen layout, color scheme, or sequence of screens. The user is asked to rate these features alongsuch dimensions as ease-of-use, logic of layout, clarity of terminology, and so forth. Such rating scales can beembedded into the instrument itself, as in the case with the ASM-CSAQ. They also may be web-accessible or sentseparately to the users via mail or e-mail. The ASM-CSAQ has a set of sixteen rating scale items, several generalcomputer use tracking items, and an entry field for suggestions. Users are given the option of responding to thesewhen completing the survey.

The use of the Plant Manager was also identified as a major problem through this method, but several navigationissues emerged as well. Users commented that jumping between units was difficult when only the CFN's, and notthe names and locations of the units were available through the interface. Accessing the "Help" system also had anaverage rating of difficult. Information editing was touched on in the open-ended feedback questions, where severalusers referred to problems with adding units. We found that extreme average ratings on features are a helpful,quantitative method of identifying usability issues.

4.3 Event Log Analysis

One advantage of electronic surveys over paper surveys is that they can capture the user’s response behavior throughan event log. Similar to a trace file, as the user goes through the process of completing the survey, every time he orshe clicks a button, makes a menu selection, or types something in using the keyboard, the action is recorded as an“event.” All the events are time-stamped with the date and the time, and then identified by event type. For theASM, there were 12 event types, including beginning and ending a session, clicking on a button, opening, moving,and resizing windows, entering text, selecting from a menu, and so forth. Information about which specific button,field, or menu was used for each event is also recorded. All of the records are listed in an output file that can beimported into statistical analysis tools. By knowing the definitions of the events and the references made in theirparticular descriptions, it is possible for an analyst to see exactly where a user went, and when, in the course oftraversing an instrument.

This was the most useful source of background data on navigation of the CSAQ. In the logs where high transitionactivity between CFN’s was observed, and the number of distinct CFN's identified them as multi-unit companies, weobserved that the survey item numbers remained the same for large blocks of events. This means that therespondents who dealt with the information from more than one unit entered the item information for every unitbefore moving on to the next item. For companies with one to three units, we observed that the general pattern wasto go through the survey from beginning to end for each unit in sequence. The item numbers generally increasedunder one CFN selection, with occasional, spurious jumps forward or back. Then, when a new CFN selection wasmade, the item numbers would return to 1 or 2 and start increasing again. Although these patterns do not indicate aparticular problem, they do provide qualitative evidence that users in different response situations do employdifferent strategies to accomplish the same overall task goal, according to the demands of their context.

5.0 METHODS USED TO TEST USABILITY

Once usability issues were identified, we chose a combination of several common methods that we built into atesting protocol: contextual inquiry, structured interview, and interface walkthrough.

5.1 Contextual Inquiry

Since the ASM-CSAQ has already been out in the field and used by a number of companies, our testing plan wasbuilt around the method of “contextual inquiry” (Dray, 1999). Contextual inquiry is an ethnographic approach thatallows for interviewing people while they are in the process of doing tasks in their natural work environment. It is a

Page 5: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1638

cooperative relationship between the observer and the user in which the user can become a “co-discoverer” ofinsight into what the task involves. The inquiry might be directed at the atmosphere of the workspace, the rolesplayed by multiple people in the work place, or the ways in which the environment affects the task being done. Suchan inquiry helps to identify specific concerns and its progress is guided by those concerns. For example, observingthat a user often has to ask where records of something are kept may prompt the tester to ask several additionalquestions about how records information is stored and organized at that company. All the respondents who werewilling to participate in this evaluation agreed to have Census Bureau staff visit their business location to interviewthem. Some examples of questions we asked were,

- “What tasks are you responsible for in a typical day on the job?" - "What do you take care of in a general sense here at the company?" - "Within your work context, where do you go to get information you may need to complete a task?”

This approach was especially useful to us with respect to the "Plant Manager" functions. For example, when askedwhich function was most useful, several users identified “Dispatch” and some identified “Search” (Search for aPlant). This is consistent with the fact that all of the users indicated that some or all of the information required bythe survey is dispersed among various units of their company. In addition, all of the users who commented said thatthat they prefer having the reporters from the various units return their unit's data to the central office to be compiledand submitted to the Census Bureau by a single person. This data about the flow of information within thecompanies prompts us to make certain that these functions are always available and understandable to the users.

5.2 Structured Interview

Within the contextual inquiry framework, we attempted to keep the data collection procedure as systematic aspossible. This is known as a structured interview, where "the questions and their sequence are predetermined"(Kirwan and Ainsworth, 1993; Dray, 1999). In our implementation of the contextual inquiry as a testing protocol,we had observations and prompts to the interviewer predefined as well. These elements functioned in combinationto guide the interviewer's timing throughout the user's response process. On the protocol, the questions were initalics, the observations were in bold type, and the prompts to the interviewers were in parentheses. Table 1 listsseveral examples of each them. The "Observation with Prompt" cued the interviewer to what to do depending onthe behavior of the user. In other words, the interviewer would be instructed to ask a question or make an additionalobservation if a behavior was or was not observed. This particular mechanism allowed us to collect the mostdetailed information possible about the usability problems.

Table 1: Examples of Structured Interview ItemsComponent ExampleQuestions "Is it clear to you that “Welcome” is a link?"

"What does the exclamation point in yellow indicate?"QuestionPrompts

(Ask this about halfway through the survey and preferably before they click on it.)

Observation Note whether they click on the “Welcome” button or not, and if so, do they click on it whenit first appears as opposed to coming back to it from lower levels?

Observation withPrompt

Do they note that the boxes present previous or already input information for each CFN as itis selected? (If there is no visible or verbal indication that they do, ask this as a probe at theend.)

There are many ways to construct structured interview protocols. What is important is that they provide enoughguidance to the interviewer that he or she can stay focused on the primary usability issues, even while maintaining aflexible "co-discoverer" relationship with the user. We found this to be the most effective way to investigate theinformation edit issue. First we observed whether or not the users read introductory information that explained theicons or experimented with clicking on the icons. Then, if they did not, we asked them if they knew what the iconsmeant. If they did click on the icons, we asked users if they understood the message that they saw, and so forth.With pre-defined questions arranged in a specific way, we were able to diagnose problems from all kinds ofdifferent user behaviors.

Page 6: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1639

5.3 Interface Walkthrough

The primary task for the users in our case was to complete the 1998 ASM-CSAQ using a contrived set ofinformation. All of the test users had completed their official reporting for the year, but we wanted them to gothrough the steps so that they would be more likely to re-encounter problems in the mock response process that theyhad experienced while doing the actual process. This is known as a walk-through method (Kirwan and Ainsworth,1993; Dray, 1999), and can be conducted either by having the user actually go through the task, as we did, or byhaving the user visualize the steps they would take mentally. We wanted the primary focus of the users' attention tobe on completing the survey in as natural a way as possible as they would if no observers were present. We chosethis method because of its compatibility with the contextual inquiry format, giving the user a chance to re-trace hisor her steps through the interface, while minimizing the influence of the observer.

The technique also helped us specifically investigate the "form-based" vs. "item-based" navigation issue. Once thetest users had done an actual walkthrough with the current, item-based ASM-CSAQ, we had them provide feedbackon a prototype design of what the ASM might look like as a form-based interface. The prototype was not functionalbeyond being scrollable and containing all the questions as they would probably appear if it were functional, butbeing presented with the general layout allowed the users to visualize in their mind how they would complete thatform and to give some general reactions on the basis of that. This technique for direct comparison was well receivedby the test users and provided valuable feedback for future design options such as Internet based CSAQ's.

6.0 CONCLUSIONS

From our work with the ASM-CSAQ, we found that it is helpful to know how the users of the instrument haveinteracted with it in the past when re-designing an electronic survey. Although information of that kind issometimes difficult to obtain, obstacles to usability may be repeated if it is not explored. In this paper, we describedseveral possible sources of such information, as well as some methods for testing usability problems identified inthem. On the basis of our observations, we concluded that contextual factors may have a large influence on how anelectronic survey instrument is used. The strategies used by different respondents depend on the demands of thetask in their particular situation. We also concluded that event logs, in particular, are a rich source of usability datawhich should be explored further. Overall, we suggest that others who are faced with usability testing tasks similarto ours consider using similar methods for approaching them as well.

7.0 REFERENCES

Dray, S. M. (1999), Tutorial Notes: Practical Observation Skills for Understanding Users and Their Work inContext, ACM.

Dumas, J. S., and Redish, J. C. (1993), A Practical Guide to Usability Testing, Norwood, New Jersey: Ablex.Hackos, J. T., and Redish J. C. (1998), User and Task Analysis for Interface Design, New York: John Wiley.Kirwan, B., and Ainsworth, L. K. (1993), A Guide to Task Analysis; London: Taylor and Francis.March, M. (1993), "Managing the Quality of Establishment Surveys in an Environment of Rapidly Changing

Survey-Taking Technologies," Proceedings of the International Conference on Establishment Surveys, pp.414-419.

Mayhew, D. J. (1999), The Usability Engineering Lifecycle: A Practitioner's Handbook for User Interface Design;San Francisco: Morgan Kaufman.

Murphy, E., Marquis, K., Hoffman III, R., Saner, L., Tedesco, H., Harris, C., and Roske-Hofstrand, R. (1999),"Improving Electronic Data Collection and Dissemination Through Usability Testing," Proceedings of theFederal Committee on Statistical Methodology Research Conference, pp. 117-126.

Nichols, E., and Sedivi, B. (1998), "Economic Data Collection Via the Web: A Census Bureau Case Study,"Proceedings of the Section on Survey Research Methods of the American Statistical Association, pp. 336-371.

Norman, D. A. (1988), The Design of Everyday Things; New York: Doubleday. * This paper reports the results of research and analysis undertaken by Census Bureau staff. It has undergone a more limitedreview by the Census Bureau than its official publications. This report is released to inform interested parties and to encouragediscussion.

Page 7: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1640

ESTABLISHMENT LIST SAMPLES, PROXY REPORTS, AND DATA QUALITY:COVERAGE AND ACCURACY ACROSS SITES

Brian Clarridge, Lauri Scharf, Center for Survey Research, University of Massachusetts BostonBrian Clarridge, Center for Survey Research, University of Massachusetts Boston,

100 Morrissey Blvd., Boston, MA 02125-3393, [email protected]

ABSTRACTThis paper reports on a data collection effort across Massachusetts Medicaid health care delivery sites in 1998.

The goal of the paper is to use information from a directory published by one of the smaller managed care organizations toexplore two aspects of the effort: a) coverage across sites and b) accuracy of the proxy reporting.

Keywords: telephone, Medicaid, cultural competence, managed care, research methods

1. INTRODUCTIONBecause establishment telephone studies are typically designed to gather information about what happens at

a place, rather than what happens to an individual, there is often a complex set of screening questions that must befollowed to verify the eligibility of a particular site before an interviewer can actively engage a respondent.Moreover, once the site has been validated as eligible, a second series of screening questions is sometimes necessaryto identify a respondent at the site who is capable of reporting the information sought for the whole site. When thetwo selection protocols are intricate or complex, a researcher can be left insecure at the close of data collection bothwith respect to the coverage of the sites intended for study and the quality of the information provided by the proxyreporters.

At the end of a recent establishment study conducted on behalf of the Division of Medical Assistance forthe State of Massachusetts, we were uncertain enough about our study design and the effectiveness of the questionswe used to screen for eligibility that we engaged in an exercise to verify the data and validate the methods we used inthe data collection.

Our paper starts at the same point that many establishment studies start, with a list sample of establishmentaddresses together with their telephone numbers. In our case the list was of physicians’ offices, clinics, and healthcenters that provide primary care services to Medicaid patients in Massachusetts. The study goal was to identify anappropriate respondent at each service delivery site and to conduct an interview concerning “cultural competence” atthe site. Cultural competence was operationally defined as a collection of service delivery options addressing thelinguistic, racial, and ethnic needs of the client populations being served.

We were working from a poor quality list that contained a variety of ineligible or misidentified addresses.Moreover, concerns emerged that the information sought might be more accurately reported by a group ofindividuals at the site, each telling only part of the story, than by a single proxy reporter. Because of thesecircumstances, we were not sure exactly how to calculate an accurate response rate or to what our sample actuallygeneralized. However, purely from a data collection point of view, it was clear to us that using proxies made theinterview more efficient and could be expected to improve both the response rate and the completeness of the data.So, we made a decision to use proxy reporters which resulted in a tension between the efficiency gained and ourworry over the potential for diminished accuracy in the information being gathered. This paper concerns itself withboth the coverage of sites and the accuracy of reports obtained from the proxies selected.

2. COVERAGE OF SITESThe goal was to interview a knowledgeable staff person at each physical location (address) in Massachusetts

where primary care Medicaid services were being delivered. Our imperfect list of all the sites providing Medicaidservices contained duplicates based on 1) more than one listed provider delivering services at the same site, and 2)sites with entrances from more than one street address.

The list also contained a variety of addresses at which 3) primary care services were not available or 4)which had closed since the list had been created. Existing sites at which no primary care was delivered tended to beeither specialty care referral sites, or administrative locations that were only used for billing purposes.

In addition, the master list of addresses was fixed in time and the real world situation was dynamic.Physicians were opening and closing individual offices and clinics all the time. So, in this milieu, without somebasis for independently determining what the real number of sites was, we had no way of knowing how well we haddone in covering them.

Page 8: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1641

3. ACCURACY OF PROXY REPORTSAs is frequently the case with establishment surveys, we knew we were placing a serious burden on the

proxy respondent to report information for the whole site. There were a number of questions that encouraged theselected respondent to estimate the sizes of the various sub-populations served, the languages spoken by clients, andthe type and quality of support provided at the site to meet the clients’ cultural needs. And while we worked hard torestrict our questions to assessments that were broad in content and readily answerable by a knowledgeable reporter,there were also a number of questions that were very focused and specific. Chief among these was a series ofquestions that required the respondent to identify the languages in which primary care was being delivered at the site.We asked the knowledgeable proxy to tell us, separately for each physician, the languages in which he or she wascapable of delivering services. We had no way to know how accurate the proxy reporters were.

4. OPPORTUNITY TO EVALUATEWe were fortunate that Network Health, a managed care organization with 31 primary care centers in 7

communities, was able to provide us with a directory on which to base an assessment of our site identification anddata collection methods. Their directory not only contained the physician names, work addresses, and telephonenumbers we needed to help us understand how well we had covered their sites, but it also contained information onthe languages spoken fluently enough for service delivery by each of their practitioners. In the first part of theanalysis we examine the degree of match between our data and their directory regarding the delivery of primary careservices. We also calculate a more accurate (and lower) response rate than the one arrived at previously.

The second part of the analysis is devoted to how well we did in identifying all the physicians who actuallydelivered primary care services for Network Health. Since physicians usually serve more than one site, we hopedthat our coverage with respect to the individual physicians in the plan would prove to be better than our overallcoverage of the sites within the plan. That is, we hoped that, whatever the coverage of sites for a plan, the coverageof physicians serving the plan would exceed it as a percentage of the total.

The third, and last, focus of our analysis addresses the accuracy with which the selected respondent for eachsite was able to report the languages spoken fluently enough by each practitioner to deliver services. This gets to theheart of the matter with respect to using proxy reporters. The information to be reported on was factual in nature andwas information that should have been reasonably well known by any knowledgeable staff person.

5. DATAThe data collection was designed as a 1998 census of all primary care service delivery sites where Medicaid

services were being delivered. Massachusetts had a newly created database that covered both sites that wereapproved by the state and billed directly to the state, and sites that were maintained by HMOs and billed through theHMOs to the state. The database was new and was known to be somewhat flawed. The State had procedures inplace to gradually cleanse the errors over time, but the perceived urgency with respect to obtaining data on culturalcompetence precluded waiting for the database to be fixed. That meant that a substantial burden for editing thecontact information fell to the research team. Participation in the study was encouraged but voluntary; sites were notrequired to participate.

6. SITE COVERAGE ACHIEVED FOR NETWORK HEALTHTable 1 presents our findings with respect to matching information between our data collection effort and

the Network Health Plan sites. Column 1 represents the list of sites in the Network Health Directory. We edited thelist of sites in column 1 by calling each site to confirm its eligibility. Five sites were found to be ineligible for thestudy (shown in column 2) as they provided no primary care services. The residual distribution of sites appears incolumn 3. Column 4 shows that CSR matched 25 out of 30 (83%) of the primary care and obstetrics/gynecologyaddresses with DMA data and 4 out of 4 hospitals. We completed interviews with 24 of these sites. Thus, ourmatching effort on the two groups combined netted us an address match on 29 of 34 and interviews at 24 (71%) ofthe sites.

7. PHYSICIAN ROSTERSOur approach to identifying and obtaining information on the physicians at each given location was

complex. Before gathering site-specific physician information, we knew we needed to edit the roster at each site. Wedid that by confirming each listed physician’s presence, one by one at the start of the telephone call. Physicianslisted but not delivering services at the site were crossed off; physicians not on the list, but who were deliveringservices at the site, were added.

Page 9: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1642

Once we had an edited roster at each site, we actively gathered information on only a subset of thephysicians present. This was done for two reasons. First, at the largest sites, the response burden on the proxyreporter could become overwhelming if they were expected to report on all the physicians present. Second, we knewphysicians usually delivered services at several sites, and we knew we didn’t need to pursue all physicians at all sites.

Our strategy then became, for each physician, to pick his or her site with the fewest other practitionerspresent. That is, if a physician was expected to be delivering services at sites with 20, 7, and 4 physiciansrespectively, we chose the 4-physician site to obtain the proxy information for him or her. This decreased theresponse burden at the largest sites at the same time it increased the likelihood that the proxy would know thephysician well enough to accurately report the language information we sought. In addition to collecting informationabout this subset of physicians pre-identified from each site roster, we also collected information on all physicianswho were added to a site when we corrected the roster. We did this because we did not know in advance if we wouldget any information on the newly listed physicians from any other place.

8. PHYSICIAN COVERAGE ACHIEVED FOR NETWORK HEALTHUsing the method described, we were able to gather information on 204 of the 213 primary care physicians

working for Network Health Plan in 1998. This is 96% of the listed physician staff and was something of a surprise,since we had only completed interviews at 71% of the plan’s sites. It was clear to us that the study design hadbenefited greatly from the fact that physicians usually do deliver services at more than one place. The sameunderlying reason that kept us from receiving reports on them at one place (bad rosters), resulted in physicians beingadded and reported on at another. Table 2 shows the distribution of physician reports for all 213 NetworkHealth physicians. While 9 had no reports on them at all and 123 had one report on them, 81 physicians (38%) hadbeen reported on at more than one site. One person was reported on at 5 different places.

9. LANGUAGES SPOKENAt this point we switch our focus from the survey coverage of sites and physicians to the quality of

reporting by the proxy reporters. For all but the 9 physicians for whom we had no report, we had proxy informationfrom a “knowledgeable staff member” about which languages were being used in delivering services. Our questionsequence for each physician included:

� Is English the Doctor’s native language?� [If no:] What is his/her native language?� To your knowledge, does the Doctor speak any (other) foreign languages?� [If yes:] What other languages does the Doctor speak?� [For each language:] Is he/she fluent enough to practice medicine in that language?

So, for 204 physicians we had proxy reports about which languages they were able to speak with their patients. Aswith the site and physician information we collected, we used the Network Health Provider Directory to validate ourfindings. When disagreements emerged, we reconciled the findings by calling or visiting the sites and talking toeither the physician him- or herself or a staff member.

10. MATCHESThe results of our language match can be found in Table 3. The top 3 rows show that for 147 physicians the

languages mentioned by the proxy reporters exactly matched the languages listed in the Provider Directory. Eighty-four, 55, and 8 physicians were reported as speaking no second language, the same second language, and the sametwo non-English languages with patients, respectively. This accounts for 69% of the entire physician list. Theremaining rows of the table characterize different degrees of matching. Rows 4 and 5 show that there were 27additional cases where the Provider Directory and the proxy reporter agreed with respect to one or more non-Englishlanguage, but where one source or the other had at least one additional language listed.

Rows 6 and 7 show the number of cases where one source indicated the physician had no languagecapability at all and the other source had one or more fluencies indicated. The contrast between the number of casesin rows 6 and 7 (25 vs. 2) suggests that the proxy survey interview elicited a few more reports of spoken languages,overall, than the Provider Directory. As we began to reconcile these cases, it often turned out that the additionallanguages reported by the proxy were, in fact, languages that the physicians spoke with patients, but which they didnot want listed in the directory. Some were languages they did not speak particularly well, and others were

Page 10: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1643

languages they did not wish to have listed because they didn’t wish to attract too many patients from one ethnicity orcultural background to their practice.

11. DISCUSSIONOur interview yield over all Network Health primary care service delivery sites came to 71% in a study with

flawed establishment addresses and a voluntary interview schedule. With respect to covering the primary carephysicians in the plan, we managed to obtain reports on 96% of all the physicians listed. This exceeded ourexpectations and was a direct result of physicians’ work patterns which often included providing services at severaldifferent sites. For about 38% of the physicians we had more than one proxy report. If, as had been talked about atthe time, the Massachusetts DMA had created a language database for the whole state based on our data collectioneffort, we are confident that few physicians would have been left out. Creating a directory, with a mechanism toupdate the listings interactively as referrals were made, was a feasible next step.

The result from the language match itself was also very encouraging: 69% of all Network Health physicianshad exact matches between the proxy report and the Provider Directory, and another 13% of the reports substantiallyoverlapped. Moreover, among the mismatches were a number where the proxy report proved more correct than thelisting in the HMO’s own Provider Directory. Some of these cases were artifacts of physicians’ desires to not haveall their languages listed in the directory. Thus, we place our overall language matching success rate somewhere inthe 80% range.

12. CONCLUSIONIn 1998 we designed an establishment study covering MassHealth sites across Massachusetts that started

with imperfect information on site locations, and a respondent selection protocol we hoped would provide accurateinformation about which languages were spoken. The original data collection asked “knowledgeable staff persons”about the languages spoken by physicians in providing health care services. The goal in this paper has been tovalidate our data and confirm our results.

We were lucky to find an HMO (Network Health) that was participating in the Massachusetts Medicaidprogram (MassHealth) and that also had a comprehensive directory with all its primary care sites, physician names,and languages spoken; all listed separately and individually. We pursued a matching strategy to provide someperspective on what our study yielded.

A poor quality site list did not prevent us from obtaining data on 96% of the physicians in the plan. Ourstudy design capitalized on the fact that physicians typically work at more than one service delivery site. In addition,the proxy reports on languages spoken turned out to be of high quality, as measured against those recorded in theNetwork Health Directory.

Overall, the coverage of the sites, and the accuracy of the proxy reports achieved, provided validation of themethods we used. Establishment surveys with proxy reporters can be a very useful tool for gathering factualinformation about the way things work internally at a site. Our two caveats for successful execution are: a) thatmaximum effort be devoted to obtaining a good quality site list at the beginning, and b) that special care be taken topretest instruments in ways that reveal what proxy reporters can and cannot report accurately.

A limitation of the study is that we do not know how representative this HMO is of all the HMOsparticipating in the MassHealth program. The information given to us suggests that it is smaller than most otherparticipating HMOs and that it does a somewhat better job of tracking the characteristics of its providers. If true, thiswould imply that our survey results should fill in the gaps with much needed information for the HMOs working withless complete information about their staffs. That was the hope of the DMA in doing the study. However, we do notwish to over-sell our results. It is probably enough to say that we think our results are positive and that we canrecommend our methods for a variety of situations where information is being sought in census-like fashion acrossestablishments over the telephone.

Page 11: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1

644

Div

isio

n of

Med

ical

A

ssis

tanc

e Si

te S

tudy

(199

8)

Tab

le 1

: C

SR S

urve

y C

over

age

of N

etw

ork

H

ealth

Prim

ary

Car

e D

eliv

ery

Site

s

D

ivis

ion

of M

edic

al

Ass

ista

nce

Site

Stu

dy (1

998)

Ta

ble

2: S

urve

y C

over

age

of

N

etw

ork

Hea

lth P

hysi

cian

s.

Num

ber

of S

ites

Rep

ortin

g on

Phys

icia

n

Num

ber

ofPh

ysic

ians

in C

ateg

ory

09

112

2

264

313

44

51

Tot

al21

3

Type

of

Site

Net

wor

kH

ealth

Dir

ecto

ryA

ddre

ssLi

stin

gD

irec

tory

Cor

rect

ions

Res

idua

lN

etw

ork

Hea

lthSi

tes

Site

sM

atch

edby

CSR

/DM

AD

ata

Unm

atch

ed

Prim

ary

Car

e/O

BG

YN

3130

25

5 N

otC

over

ed a

tA

ll(2

Fou

nd b

yC

SR)

(Inte

r-vi

ewed

)

1 N

oPr

imar

yC

are

Off

ered

(22)

(22)

Hos

pita

l8

44

0

(Inte

r-vi

ewed

)

4 M

ain

Hos

pita

lPh

one

Num

bers

No

Prim

ary

Car

e

(2)

(2)

Tota

l39

534

295

(Inte

r-vi

ewed

)(2

4)(2

4)

Page 12: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1645

Division of Medical Assistance Site Study (1998)

Table 3. Language Matches Among Doctors: CSR Data Collection vs. Network Health PlanDirectory.

MATCH BETWEEN NHP & CSR n % RESOLUTION

Same: No language indicated by either CSR orNHP 84 39.4%

Same: matched on 1 language 55 25.8%

Same: matched on 2 languages 8 3.8%

Total Matched on Languages 147 69%

CSRCorrect

NHPCorrect

NeitherCorrect NA

CSR matched all NHP languages but had morelisted 13 6.1% 4 4 5

NHP matched all CSR languages but had morelisted 14 6.6% 9 3 2

CSR has languages listed, NH was blank 25 11.7% 6 13 1 5

NH had languages listed, CSR was blank 2 0.1% 1 1

Same number of languages but each had onethe other did not 1 <0.1% 1

Languages listed but lists completely different 2 0.1% 2

Total Mismatched on Languages 57 26.8% 10 30 10 7

Total Matched and Mismatched 204 95.8%

Doctors Not Matched At All 9 4.2%

Total NHP Doctors 213 100%

Page 13: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1 This paper reports the results of research and analysis undertaken by Census Bureau staff. It has undergone a more limited reviewby the Census Bureau than its official publications. This report is released to inform interested parties and to encourage discussion. The authorsacknowledge helpful review comments from Douglas Bond and Anna Chan as well as input from Judy Dodds. Also, the authors wish to thankDiane Willimack for reviewing multiple versions and providing countless suggestions for improvement.

1646

RESULTS OF COGNITIVE INTERVIEWS STUDYING ALTERNATIVE FORMATSFOR ECONOMIC CENSUS FORMS

Kristin Stettler, Rebecca Morrison and Amy E. Anderson, U.S. Census Bureau1

Kristin Stettler, U.S. Census Bureau, 4700 Silver Hill Rd, FOB #4, Room 3110, Suitland, MD [email protected]

ABSTRACT

The Economic Census is conducted using self-administered forms tailored to major industries. Tabular (“spanner”) layoutshave been used for obtaining product detail data for the manufacturing sector, while other sectors have used an indented layout.Cognitive interviews, using vignettes, were conducted at manufacturing establishments to explore whether changing the layoutwould affect respondent reporting, specifically related to “first-line bias,” the reporting of aggregate data on the first detailedline available. Little difference was found in the reporting of data with either layout. We recommend using the indented layoutto increase consistency with other economic census forms.

Key Words: Questionnaire design, Establishment surveys, Tabular layout, Graphical formats, Self-administeredforms, Vignettes

1. INTRODUCTION

The Census Bureau is interested in identifying better procedures for collecting detailed data on products andservices, in order to minimize misreporting. This paper reports the results of cognitive research on alternative formatsfor these items, which have multiple levels. The research was motivated by an initiative to achieve consistency, whereappropriate, in forms design for the 2002 Economic Census for the various economic sectors. Thus, we focused on theeconomic census of manufactures which historically has used a different format from the other economic sectors.

The primary goal of the research was to determine if converting from a tabular layout to an indented layout forthe product lines on the census of manufactures would result in less accurate reporting, particularly first-line bias, whichis the reporting of aggregate figures on the first detailed line available. We begin with a description of themanufacturing census, the research problem and a review of relevant literature. This is followed by a description of ourresearch methodology. After detailing our research findings, we close with our conclusions.

2. BACKGROUND

2.1 Overview of Economic Census of ManufacturesThe census of manufactures, one part of the overall economic census program, is a mandatory mail-out/mail-

back census sent to approximately 280,000 multi- and single-unit establishments every five years. The purpose is toprovide periodic statistics about manufacturing establishments, activities, and production.

Respondents receive either the manufacturing long form (up to 16 pages) or the short form (4 pages), whichobtain basic data from all establishments including their kind of business, geographic location, type of ownership, totalrevenue, annual and first quarter payroll, and number of employees. The long form also requests detailed statistics forinventories, capital expenditures, materials consumed, cost of materials, energy consumed, and quantity and value ofshipments for roughly 11,000 products. Establishments receiving the long form include those previously selected forthe Annual Survey of Manufactures (ASM) sample, establishments above a payroll cutoff (which varies by industry)and all multi-units.

Page 14: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1647

The economic data from this census is used for many purposes: to benchmark gross domestic product estimatesand producer price indexes, to prepare indexes of industrial production, to assist in forecasting economic conditions,and for planning, analysis, and decision making.

2.2 Description of ProblemThe economic census is collected using multiple self-administered forms with content and graphical formats

tailored for each major industrial sector. Forms for manufacturing industries have traditionally used a tabular format(called a “spanner”) to display successive levels of detail for collecting product data (see Figure 1). Other industryforms have traditionally indented these levels (see, for example, Figure 2, which presents product detail for a serviceindustry). The census of manufactures collects up to five levels of product detail and does not allow for aggregatereporting. Some other industries allow for respondents to report either aggregate level data or detailed data. No otherindustry collects more than three levels of product detail. All forms indicate that estimates are acceptable. In planningfor the 2002 Economic Census, efforts are being made to achieve consistency in design elements across various forms.Therefore, it was proposed that the manufacturing forms be converted to use indentation for the product lines.

Figure 1: Example of Tabular Layout (aka “Spanner”) Used for Cognitive Tests (Electronic Components andAccessories)

Figure 2: Example of Indented Layoutfrom Economic Census - Services Sector

Analysts and managers in the manufacturing programarea were concerned about this potential change sincethere was Census Bureau lore about substantialreporting errors caused by use of an indented layout onthe 1967 economic census forms. Many respondents tothe 1967 census had to be recontacted (delayinganalysis and publication of the results) to obtaindetailed data because they failed to break out theirproduct information; instead they reported aggregateinformation only, entering it on the first detailedproduct line. Because of the way the questionnaire wasformatted, it appeared to many respondents that thiswas acceptable (see Figure 3 for a portion of the form).This aggregate data got keyed as if it was the responsefor the first detailed line only; thus resulting in over-reporting of specific products listed on these first lines.This has been called “first-line bias.” Therefore,minimizing first-line bias was one of the primaryobjectives in 1) reformatting the manufacturing formsto incorporate indentation and 2) testing thereformatted forms.

Page 15: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1648

Figure 3: Example of Indented Layout from 1967 Economic Census - Manufacturing Sector

2.3 Literature ReviewCognitive interviewing has been a valuable tool for developing questionnaires, especially for surveys of

households and individuals. This same tool can also be used to develop and improve questionnaires for businessestablishments. Indeed, the same cognitive interviewing methods used for individuals can also be used for businesses:think-aloud techniques, vignettes, and probes.

The process a business respondent goes through to fill out a survey is similar to that of an individual, but withthe added complication of needing to access information from external sources (Edwards and Cantor, 1991). It isprecisely because of this additional burden that establishment survey forms should be clear in terms of the informationbeing requested. Cognitive interviews are one way of determining how to present requests for information torespondents. One of the purposes of the cognitive interview might be to determine how easy it is for respondents toaccess the information needed (Gower and Nargundkar, 1991).

Cognitive interviews have also been used to test questionnaire formatting and layout. Jenkins and Dillman(1993, 1997) addressed issues of visual language in addition to verbal language, outlining principles for designingrespondent-friendly questionnaires, including placement of instructions and explanatory information so respondents willread them. Von Thurn and Moore (1994) also researched format issues, related to the American Housing Survey.Zukerberg and Lee (1997) discussed steps taken in a particular survey to make it look less burdensome and moreattractive.

3. RESEARCH METHODOLOGY

From February - April, 2000, we conducted cognitive interviews with respondents from 17 single and multi-unit establishment reporters who would have been eligible to receive the economic census had it been conducted in1999. We conducted interviews with five or six companies in each of three industries (Electronic Components andAccessories, Blast Furnace and Basic Steel Products, and Apparel), where there have been reporting problemshistorically. Selection criteria included payroll, employment, industry heterogeneity/ homogeneity, response experiencebased on cooperation and data quality, and public/private.

Census Bureau staff from the economic directorate conducted the interviews: a survey methodologist wasaccompanied by a subject matter specialist. Interviews took place at companies in the mid-Atlantic region. Participantswere recruited through their company contacts. For nine of the cases (when time allowed), we mailed a confirmationletter, along with one version of the form, asking respondents to complete the products section before the interview.

Most interviews were audiotaped, with the respondent’s permission, to facilitate accurate summarization ofthe results. Respondents were informed that their response was voluntary and that the information they provided wasconfidential. Interview lengths ranged from 45 to 90 minutes; the average was an hour.

As can be seen in Figure 1, the tabular version of the census of manufactures form does not have a place forreporting aggregate level data. As the indented version of the form was prepared for testing, emphasis was placed onreducing first-line bias. For the indented version, hash marks were placed in the answer boxes at the aggregate levelto indicate to respondents that aggregate level data was not acceptable (see Figure 4).

Page 16: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1649

Figure 4: Example of Indented Layout Used for Cognitive Tests (Electronic Components and Accessories)

The layouts were evaluated using two different methods: think-aloud/debriefing and vignettes. Half of therespondents received the original, tabular version to complete for 1998 or 1999 (whichever was more convenient forthem) using their company’s data, while the other half received the alternative, indented format. We asked respondentsto begin completing the form where the spanner or indent layout began. If the respondent was unable to access actualdata, we asked them to describe how they would identify their products/materials on the form, how they would locaterelevant records and then enter the data on the form. We used a think-aloud method, encouraging them to talk out loudwhen reading questions, locating data in records and entering data on the form.

We then introduced the vignette, using the layout not yet tested. A vignette is a tool that allows researchersto study respondents’ understanding and treatment of a survey instrument. Traditionally, vignettes have been presentedas short narratives that describe a particular situation of interest (Gerber, 1996). Respondents are asked to interpret thesituation and then apply it to the survey instrument being studied. We adapted this approach to study how a respondentuses the tabular and indented layouts in the product section of the census of manufacturing form. The intent in usingthe vignette was to uncover possible effects that the tabular or indented layout may have on the accuracy of respondents’reporting, particularly with regard to first-line bias.

To ensure that our vignettes were comprehensible to the respondents, separate vignettes were developed foreach of the three industries being studied. The vignette was presented as a mock internal document from a fictitiouscompany that contained product descriptions and values (see Figure 5).

Figure 5: Example of Vignette Used for Cognitive Tests (Electronic Components and Accessories)

Page 17: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1650

Some product descriptions in the vignettes were designed to be detailed while others were vague. For severalof the products that contained numerous levels, descriptions were given that did not completely characterize the productof interest. Respondents had to decide at which level they would report. Respondents were asked to complete theproduct section of the form using the information provided on the mock internal document.

After the respondent completed the vignette, we asked both scripted and impromptu follow-up probes,focusing on the layout and usability of each form. We then asked the respondent to compare the two forms in terms ofease of completion and clarity. We also asked them to state a preference. If it had not come up previously, we explainedthe issue of first-line bias and asked for additional feedback. Finally, we asked for any other recommendations orsuggestions for improving the form.

4. RESEARCH FINDINGS

In general, there did not seem to be a strong preference for one version of the form over the other. Althougheach individual tended to have a preference (often strongly felt and expressed), there was about an equal split acrossall the participants (nine preferred the indented version, seven the tabular layout and one had no preference).

The nine respondents who preferred the indented version tended to feel that it was more clear and less cluttered.Examples of comments about the indented version include:

S “Seemed more precise -- easier to read and follow.”S “Doesn’t have lines so can read more quickly.”S “This is a form that was developed by an accountant -- simple and to the point.”S “More space on form -- more resting on eyes to look at.”S “Less complex; clearer; easier to understand.”Several respondents mentioned that they had been trained in school to use outlines for organizing complicated

material and so this version made it easier for them to keep track of the organization of the items and subitems. Severalrespondents talked about how they normally read text across a page, as in the indented version. The tabular version,particularly those items with multiple columns where each column was narrow, was more difficult for these respondentsto read since they had to keep shifting their eyes down to the next line in each category, instead of reading across. Thiswas an interesting finding because although the actual words in each category were the same, it was evident that thelayout of the form affected how difficult it was for respondents to read and comprehend it.

The seven respondents who preferred the tabular version tended to feel it made it easier to quickly find adefined area and to understand the organization of the items. Several respondents noticed that this version of the formappeared shorter (because of spacing issues, the tabular version tended to be a page or two shorter than the indentedversion). Examples of comments about the tabular version include:

S “Left column stays with you so you always know the overall category.”S “Putting header categories to left catches your attention more.”S “There was less continuous text across the page.”Because first-line bias is a rare event (though costly when it occurs), we were concerned that we would find

little evidence of it during the part of the interview where the respondents entered their own company’s data (thinkingaloud, if possible). No respondent had completed all the relevant parts of the form in advance as we had requested inour pre-interview letter. Also, during the interview, we were unable to get most respondents to access their company’sdata for the prior year, let alone enter it on the form. So as a fallback, we discussed with respondents the processes theywould use to identify categories on the form for which they would enter data and how they would go about accessingthat data in their company records. Therefore, it was extremely difficult to determine from this interaction whether first-line bias would be an issue with either version of the form. Some respondents did note that the level of detail requestedwould be difficult to provide accurately. Most felt that they could provide reasonable estimates.

Fortunately, the vignettes were designed to encourage respondents to make difficult choices that might resultin first-line bias. In addition, they required that each respondent write numbers down on the form, as they would ifcompleting a form with their own data. Thus, we were able to evaluate whether the version of the form was related tothe likelihood of a respondent entering data incorrectly.

There was very little evidence from the vignettes to suggest that either version of the form would result in morefirst-line bias. Most respondents began by noting that they would try to find the correct detailed information. At aminimum, most respondents indicated they would write a note that they were reporting aggregated data -- key punchoperators would flag these and subject matter analysts would be able to estimate or call back respondents in these cases.

Many respondents chose to split the aggregated totals from the vignette among the detailed choices, equally

Page 18: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1651

or based on the estimated percentage in each subcategory. Many respondents made an informed decision on whichdetailed line to report the aggregated total. They based these decisions on their knowledge of the industry or on a senseof which product subcategory was the largest.

A few respondents did report aggregated data on the first detailed line, but they did so equally on either version.When asked, they admitted they would have done so on both versions of the form.

5. CONCLUSIONS

In general, our cognitive interviews found that whether a respondent entered aggregate level data on the firstdetailed line was not related to the layout of the form received. The cognitive testing we conducted did not indicate thatthe revised indented version of the economic census would result in greater measurement error for the manufacturingestablishments we studied. We found no reason to prefer the tabular (“spanner”) format over the indented version.There was no indication that the first-line bias problems found in the 1967 attempt at indentation would occur with theformatting that we tested.

Given the Census Bureau effort to achieve consistency in design elements across various forms for the 2002Economic Census, our recommendation, based on the results of these cognitive interviews, is to convert the productline items on the manufacturing forms to indentation. This will also allow for simpler creation of automated forms.

Besides acquiring sufficient knowledge to address the specific question about formatting economic censusforms, our research suggests the following conclusions about the efficacy of cognitive testing in the establishmentsetting:S Formatting can affect measurement error. This was evidenced by the first-line bias exacerbated by the 1967

formatting, but alleviated by the indentation formatting tested.S Cognitive testing can be used to learn about this. Our study used cognitive research methods to delve into the

effect formatting had on first-line bias in the economic census.S Vignettes work in an establishment setting to overcome obstacles of data availability. Respondents in our

cognitive interviews were hesitant to take the time to access their records to complete the form. Therefore, ouruse of vignettes allowed us to watch respondents use the layout of the form for determining how to enter data.

S Using multiple methods helped us learn about all steps in the cognitive response process and gave us greaterconfidence in our conclusions. The debriefing portion of the interview allowed us to see how respondentsreacted to the layout of the form given the availability of their own company’s data. The use of vignettesallowed us to see how respondents reacted to the form’s design when the provided data did not fit perfectlywith the available answer categories. Although respondents used different methods for determining how toenter the vignette data, we did not see differences attributable to the form’s layout.

6. REFERENCES

Edwards, W. S., and D. Cantor (1991), “Toward a Response Model in Establishment Surveys,” in P. P. Biemer, R. M.Groves, L. E. Lyberg, N. A. Mathiowetz, and S. Sudman (eds.) Measurement Error in Surveys, New York: John Wiley & Sons,pp. 211-233.

Gerber, E. R., T. R. Wellens, and C. Keeley (1996) “Who Lives Here?: The Use of Vignettes in Household RosterResearch,” paper presented at Annual Meeting of the American Association for Public Opinion Research, Salt Lake City, Utah.

Gower, A. R., and M. S. Nargundkar (1991), “Cognitive Aspects of Questionnaire Design: Business Surveys VersusHousehold Surveys,” Proceedings of the 1991 Annual Research Conference, Washington DC: Bureau of the Census, pp.299-312.

Jenkins, C. R., and D. A. Dillman (1993), “Combining Cognitive and Motivational Research Perspectives for theDesign of Respondent-Friendly Self-Administered Questionnaires,” paper presented at Annual Meeting of the AmericanAssociation for Public Opinion Research, St. Charles, Illinois.

Jenkins, C.R., and D. A. Dillman (1997), “Towards a Theory of Self-Administered Questionnaire Design,” in L.Lyberg, P. Biemer, M. Collins, E. DeLeeuw, C. Dippo, N. Schwarz, and D. Trewin (eds.) Survey Measurement and ProcessQuality, New York: Wiley-Interscience.

Von Thurn, D., and J. Moore (1994), “Results from a Cognitive Exploration of the 1993 American Housing Survey,”paper presented at Annual Meeting of the American Association for Public Opinion Research, Danvers, Massachusetts.

Zukerberg, and Lee (1997), "Better Formatting for Lower Response Burden," Proceedings of the Survey ResearchMethods Section, American Statistical Association.

Page 19: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1652

HARMONIZING SURVEY CONTENT: INTEGRATING CANADA'S ANNUAL SURVEY OFMANUFACTURES INTO THE UNIFIED ENTERPRISE SURVEY

John S. Crysdale, Statistics CanadaRoom 1105, Main Building, Ottawa, Ontario, K1A 0T6, Canada

[email protected]

ABSTRACT

Many of Statistics Canada's business surveys are being redesigned and integrated into the Unified Enterprise Survey(UES). One of the most significant surveys to be integrated is the Annual Survey of Manufactures (ASM). The presentpaper addresses (1) the survey integration process, (2) the nature of and rationale for additions made to ASM content,deletions, portions left intact, changes contemplated but not made, and (3) format issues, including personalization,splitting into separate surveys, reworkings, and the use of reporting guides. The paper concludes with a look to the futureand a summary of improvements made.

Key Words: Project for the Improvement of Provincial Economic Statistics (PIPES), survey redesign, personalizedquestionnaires.

“I have answered three questions, and that is enough,”Said his father; “don’t give yourself airs!”

Lewis Carroll (Alice in Wonderland)

1. SETTING THE STAGE

1.1. The Unified Enterprise Survey (UES)

Statistics Canada is currently engaged in a project to improve provincial and territorial economic statistics. As partof this work, many of its business surveys are being redesigned and integrated into the framework of the UnifiedEnterprise Survey (UES). A number of previously unsurveyed industries are also being covered through this sameframework. Although referred to in the singular, the Unified Enterprise Survey is actually a series of surveys usingcommon concepts and processes and having a coordinated survey frame.

The expanded and improved data that result from the UES will enable Statistics Canada to produce—on an annualbasis—high-quality provincial and territorial input-output tables. Among their other uses, these tables—togetherwith improvements at the national level—will help different levels of government in effecting revenue-sharingarrangements.

There are three major questionnaire types within the UES. The first type covers “complex” enterprises—thoseoperating in more than one province, more than one industry, or comprising more than one company. The secondcovers establishments of complex enterprises. The third covers establishments of “simple” (i.e., non-complex)enterprises. There is also a head office questionnaire and an assortment of schedules and supplements.

1.2. The Annual Survey of Manufactures (ASM)

The Annual Survey of Manufactures is one of Statistics Canada’s most important business surveys. It is also one ofStatistics Canada’s longest-running surveys—having been conducted annually since 1917. Now, under the NorthAmerican Industry Classification (NAICS), the ASM covers 259 manufacturing industries. The Annual Survey ofForestry and Logging (ASF) is conducted in conjunction with the ASM. The ASF covers five logging and forestryindustries. In this paper, the two surveys—which collectively cover close to 20% of GDP—will henceforth bereferred to as the ASM.

ASM data releases comprise operating statistics (usually referred to as “principal statistics”) and detailed commodityinformation—both at various levels of industrial and geographic detail. Users of these data include other federalgovernment departments—such as Finance Canada, Industry Canada, and Natural Resources Canada—provincial

Page 20: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1653

and territorial statistical agencies, provincial and territorial economic development ministries, internationalorganizations such as the OECD and the UN, as well as industry associations, consultants and academics. Thesegroups use the data in the calculation of federal-provincial equalization payments, in the elaboration of regionalinequities, in market studies, in the resolution of trade disputes, and in work relating to productivity, to theenvironment and to new products and processes. The data are also used within Statistics Canada—in the System ofNational Accounts to compile the input-output and income and expenditure accounts, in Prices Division to constructprice indices, in the Monthly Survey of Manufacturing to benchmark the leading indicators shipments, inventoriesand orders, and in the Analytical Studies Branch to assess, among other things, the determinants of success andgrowth in the manufacturing sector. This latter—longitudinal—work takes advantage of the fact that completemanufacturing microdata extend back to the early 1970s, with each establishment uniquely identified so that it canbe tracked through time.

There are nearly 130,000 establishments in the manufacturing universe. Roughly 22,000 of these are surveyed bymeans of questionnaires. The rest are covered by administrative data sources. ASM questionnaires comprise long-form questionnaires for large manufacturers (those with large manufacturing establishments or multi-establishmentoperations), short-form (less detailed) questionnaires for small manufacturers and a separate questionnaire for headoffices and ancillary units.

The main components of the long-form are: inventories, unfilled orders, fuel and electricity, material inputs(including containers), shipments, shipments by destination, and labour expenses. Within the fuel, inputs,containers, and shipments sections, detailed commodity information is also requested. There are 26 different masterversions of the long-form—constructed generally for groups of industries. The variations between these templatesrelate almost entirely to the detailed commodity questions. All long-form questionnaires are personalized to reflectthe past commodity reporting of the subject establishment.

2. CREATING THE ASM/UES QUESTIONNAIRE—PROCESS

In order to provide a starting point for discussions with representatives of survey areas that are about to beintegrated, a set of generic—or “model”—questionnaires has been developed. These reflect the basic contentrequirements of the System of National Accounts and the basic format requirements of the collection areas. Themodels are structured and sequenced along the lines of a typical income statement and use standard accountingterminology. The main components relate to revenue, expenses and inventories, as well as to revenue distributionby customer type and revenue distribution by customer location. Within these core components, the models providea number of standard menus. For example, there are several versions of the purchased service expenses module—each differing from the other in terms of the specific items listed and the levels of detail requested. In addition, thereare a number of administrative components—such as contact identification, reporting period and main businessactivity information, and an enumeration of events that might have caused significant change from data reported inthe previous period.

The process of creating an integrated ASM/UES questionnaire began by merging the content of the ASM long-formand the content of the corresponding UES model questionnaire. Duplication was then eliminated. The resultingquestionnaire was referred to as the “superset”.

Following the creation of this questionnaire, there ensued a consultation process in which the interested parties tookthe superset and went through it question by question. Participants in the discussions included representatives fromthe ASM and UES, the System of National Accounts, Prices Division, provincial statistical agencies, as well as theservice areas involved in collection, capture, edit, imputation and frame maintenance. For each question, thefollowing sorts of considerations were addressed: Were the data needed? If so, by whom and for what reason?How essential were the data? Was the question applicable to manufacturing? Was it empirically significant? Wasit straightforward to ask? Was the terminology well understood? Did it correspond to existing record-keepingpractices? If not, was it, nevertheless, simple to answer? Was it better asked at a greater level of detail or at alesser? Was there an alternative source for this information? Was it stable from year-to-year? Did it need to beasked every year?

Page 21: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1654

Once these consultations had led to a solid draft, external testing began. Statistics Canada has a policy prescribingquestionnaire testing. In this case, Statistics Canada’s Questionnaire Design Resource Centre reviewed the draft andconducted one-on-one interviews with nearly fifty respondents.

Following development and testing of the main version of the new questionnaire, three industry-specific variantswere also developed. These cover the logging, sawmills, and printing industries and involve small differences fromthe main form—the differences relate primarily to the main business activity question and to the inventory section.A questionnaire was also developed for small manufacturers, based on the detailed version of the ASM short-formand the UES model questionnaire for establishments of simple enterprises. Two industry-specific variants of thisform were also developed; these cover the logging and printing industries. Work was also done to modify the UEShead office questionnaire so that it would fit the needs of the ASM and would continue to fit the needs of othersurvey areas.

3. CREATING THE ASM/UES QUESTIONNAIRE—CONTENT

3.1. Additions to the original ASM questionnaire

The largest single addition to the manufacturing questionnaire was a set of thirteen questions related to purchasedservice expenses. Examples of such expenses include amounts paid to external businesses for transportation andstorage services, and for legal and accounting work. Further examples include postage, courier, telephone,advertising and travel expenses, as well as royalties, insurance premiums, and franchise fees. Based on a specialsurvey, these expenses were estimated—for reference year 1990—to be in excess of $40 billion—close to 15% ofthe value of shipments reported in the manufacturing sector. The thirteen purchased service items appearing on themanufacturing questionnaire were selected on the basis that they would meet the analytical needs of the System ofNational Accounts, that information about them was available in respondents’ accounting systems, and that theywere empirically significant in the manufacturing sector. The items were selected from a list prepared by StandardsDivision of Statistics Canada and based on the Central Product Classification of the United Nations.

The addition of purchased service expenses means that these expenses can be taken into account in the calculation ofvalue-added. This measure of value-added is often referred to as “true” value-added. It differs from the measure,often referred to as “census” value-added, which does not take these expenses into account. True value-added willnow be measurable at the establishment level. The coherence between enterprise- and establishment-based resultscan also be analysed.

Beyond the purchased service additions, several miscellaneous expense items have also been specified. Theseinclude management fees paid to head offices, depreciation, amortization, interest expenses, as well as allowancesfor bad debts, amounts donated and inventory adjustments. Two items have been added to the employmentsection—the employer portion of employee benefits, and provincial health and education payroll taxes. Waterutility expenses have been added to the fuel and electricity section. On the revenue side, subsidies, royalties,franchise fees, and interest and dividends are now explicitly covered.

There are several other additions: (1) For System of National Accounts purposes, inventories held or onconsignment abroad are now in scope. (2) For purposes of converting from fiscal-year to calendar-year data,temporary and seasonal closing information is now sought. (3) For frame purposes, businesses are asked to indicatetheir main business activity—manufacturing, retail/wholesale trade, services, other—and whether this represents achange from the previous year, and about their acquisition and disposal of other operating units. These questions aredesigned to trigger follow-up enquiries when required.

Page 22: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1655

3.2. Deletions

A number of items have been deleted from the original ASM questionnaire: (1) Hours paid has been droppedbecause of low response, high estimation, and probable poor quality. This variable will be modelled from othersources. Hours worked information had already been dropped from the ASM due to reporting difficulty—hencesimilar concerns about response rates, estimation and quality. (2) Book value of new construction by own labourforce and book value of machinery and equipment manufactured by own labour force for own use have both beendropped. These will continue to be asked on capital expenditure surveys. Neither fit into an income statementformat. Both are considered difficult to collect. The two corresponding non-manufacturing input items have alsobeen dropped. (3) The question relating to corporate status has been dropped inasmuch as most manufacturers areincorporated. The same information can be derived from administrative data sources. (4) Because of the smallvalues involved, fuel inventories are now included in the residual inventory category—rather than as a separate item.

3.3. Ongoing Content

When originally developed, the UES model questionnaires for establishments were considerably influenced by theASM long-form questionnaire. As a result, significant portions of the ASM long-form have remained intact in theactual conversion to the UES.

In terms of the original ASM long-form questionnaire: The inventories and fuel and electricity questions remainlargely unchanged. The unfilled orders question is completely unchanged-and will continue to be customized byindustry (so as not to be asked in cases where the data will not be used). Although the inputs section involves muchmore detail, the latter can be characterized, for the most part, as additions to ongoing content. The same can be saidfor outputs—with somewhat fewer additions. Progress payments have been put on a customized basis. Thedestination of shipments question is unchanged.

3.4. Changes contemplated but not made

One item contemplated but not added was a module relating to revenue by type of customer—i.e., sales tohouseholds, government, public institutions, financial businesses and other businesses. Most UES questionnairesrequest such a breakdown. This is done to permit a more precise estimate of spending by sector—particularly by theconsumer sector. In the case of manufacturing, the consumer portion of this breakdown was seen as having limitedempirical significance inasmuch as there is relatively little direct dealing between manufacturers and households.Accordingly, asking for the five-way breakdown seemed to impose more burden than having it would justify. Thedecision not to include this module reflects the cost-benefit tradeoffs faced generally in creating the ASM/UESquestionnaire.

Also having limited empirical significance in the manufacturing sector was a module asking for information aboutjoint-venture activity. As a result, and given the availability of alternative data sources, the joint-venture modulewas not added to the manufacturing questionnaire. A proposed module to help measure e-commerce was alsodropped pending the outcome of a separate economy-wide survey on this topic. Also contemplated was the additionof a destination of shipments question for each output commodity and for goods purchased for resale (GPRS).Given the response burden and data quality issues, and given that overall destination data continue to be collected,neither module was incorporated.

4. CREATING THE ASM/UES QUESTIONNAIRE—FORMAT

4.1. Retain personalization?

One challenging aspect of ASM integration was the question of whether or not to retain personalization of the input,container and output commodity sections of the long-form questionnaires. Since reference year 1993, these sectionshave been personalized at the establishment-level to reflect previous period reporting. Prior to 1993, commoditysections were the same for all long-form establishments in a given industry. These industry-based lists ofcommodity questions were generally long and, for any given establishment, often largely inapplicable. In oneindustry—an extreme case—establishments were prompted with questions about more than 350 commodity items.

Page 23: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1656

Most of these items were reported somewhere in the industry, but any given establishment would report only a smallsubset—the average was fewer than fifteen. This meant that the person completing the form would have had tospend considerable time going through the long list to find the commodity items that were applicable in theirparticular case.

Personalization is generally viewed as having been an innovative step towards reducing response burden (or at leastperceived burden) and towards modernizing the questionnaire. Personalization is well regarded both by respondentsand by subject matter staff. When its removal was discussed, the most frequent reaction was that this would be abackward step.

Personalization has been retained—in fact, it is expected that it will be extended into the module collectingemployment by location. However, a number of negatives were raised in reassessing personalization as it alreadyexists. (1) There was concern that some respondents were not reporting some commodities because the personalizedquestionnaire does not ask them the full set of possible commodities—and that the corresponding values were beingleft out altogether or were being included as part of a residual commodity category. This concern can be addressed,at least in part, through improvements to the commodity-selection algorithm—such as listing blocks of relatedcommodities or using more than just one prior year’s reporting history in generating each establishment’squestionnaire. (2) The printing of the personalized questionnaires is dependent on the availability of completed datafrom the previous year's survey. The time-frame this imposes precludes printing in advance; and the current set-updoes not permit use of the more user-friendly multi-colour formats. By going back to a common, industry-basedcommodity list, with a well chosen set of commodity classes, questionnaire designers could have taken advantage ofheadings, short, hierarchical commodity descriptions, and multi-colour formats to produce a user-friendly—albeitsomewhat longer—questionnaire. However, any deficiencies in appearance and timing have been considered to bemore than offset by the relative shortness and high applicability of the personalized form. (3) Personalization alsocomplicates the implementation of web-based electronic data reporting and the use of character-recognitiontechnology for data-capture. These complications can be addressed over time.

4.2. Split into two surveys?

Over the past few years, the suggestion was often made that the commodity sections of the long-form should bemoved outside the main questionnaire—and on to a series of annexes or schedules. This would allow the survey tobe conducted in two parts. It envisaged a standard “principal statistics” questionnaire for financial data and a set ofpersonalized commodity annexes. It envisaged that the two parts might be sent separately—potentially at differenttimes and to different people within each business organization. It was also envisaged that the capture, edit andimputation systems might be different.

Part of the rationale for this two-part approach was the belief that there were different sources for the two differenttypes of information: the firm’s accountant was believed to be the source for principal statistics information and theplant manager the source for commodity information. However, research conducted by ASM staff revealed that, ingeneral, everything is routed through the accountant. Where the accountant needs further information, other sourcesare contacted. From the perspective of who completes the questionnaire, there is no merit in breaking the surveyinto two parts.

However, because of the processing-related attractions of having a two-part approach, three different formats wereformally tested. The first had a set of commodity annexes that were physically separate from the financialquestionnaire. The second took these same commodity annexes and moved them as a block into the financialquestionnaire. The third took the commodity annexes and integrated their contents into the appropriate section ofthe financial questionnaire. Respondent reaction was generally negative to anything other than the third option—thefully-integrated approach already in use in the manufacturing questionnaires. The feeling was that the other formatsinvolved a great deal of flipping back and forth, that the forms were cluttered with cross-references and instructions,that the same total might be requested more than once (and with potentially conflicting results), and that a separateannex might not be fully-completed or returned. Conducting the survey in two parts is no longer a consideration.

Page 24: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1657

4.3. Reworkings

A number of minor reworkings were undertaken. These represent new ways of asking for the same information. (1)The most visible of these is the altered sequence of questions—from the input-output flow of the original ASMquestionnaire to the revenue-expenses flow of the UES questionnaire. The latter corresponds to the typical incomestatement approach. (2) The inclusion and exclusion items at the beginning of the input, output, inventory, andlabour sections are now more explicit. They are in point form and are consolidated from both the questionnaire andthe reporting guide. (3) The items related to the basis of reporting have been changed to try to encouragerespondents to report on a shipments-basis (rather than on a production-basis, which would include additions toinventories) and a purchases-basis (rather than a usage-basis)—or to clearly identify the basis that is used. (4) Forediting and analysis purposes—and to reduce the need for follow-up—respondents are requested to identify eventsthat may have caused changes from previously-reported values. A pre-printed list of possibilities, sorted indescending order of likelihood, has been reworked slightly from the original ASM. Considerable write-in space isalso provided. (5) For the same reasons as the “events” information, respondents have also been requested toprovide the address of any websites they maintain.

4.4. Reporting Guides

The ASM has traditionally used reporting guides for its long-form and head office questionnaires. Respondents canconsult the guides on an as-needed basis. Within the guides, there is a detailed explanation for each questionnaireitem. Removing such explanations from the questionnaire shortens the form and reduces clutter. Long-timerespondents, who know what is expected, prefer this approach. Unfortunately, in some cases, necessary consultationmay not occur. In the UES, the use of guides has been the exception. In the case of the ASM, because of theconsiderable value it adds, the guide has been retained in the conversion to the UES.

5. THE FUTURE

Content and format are now reasonably final for reference year 2000 and focus has shifted to support applicationsand the actual running of the survey. For the future: (1) Statistics Canada has established a committee to addressresponse burden. Its objective is to develop a framework within which to describe the purposes for which questionsare asked, the availability of alternative sources, the difficulty in responding—all the sorts of questions described inSection 2 of this paper. (2) Statistics Canada is also developing a chart of accounts for its business surveys. Thepurpose is to better document the underlying concepts for the measurement of which data are being sought.Questionnaire wording can then be assessed in the light of those concepts. This also provides a structure forensuring consistent content across surveys. (3) Formal field testing will continue. It will address changes made as aresult of the actual reference year 2000 experience. And, on an ongoing basis, it will assess whether content remainsrelevant, whether respondents are providing the intended information, and whether the questions fit well withexisting accounting practices.

6. CONCLUSIONS

The integration of the Annual Survey of Manufactures into the Unified Enterprise Survey has been accompanied bya number of improvements to the manufacturing questionnaire. Comprehensive financial data will now be collected,including the purchased service expenses information needed to produce, at the establishment level, a measure ofvalue-added that takes those expenses into account. A number of items—all poorly reported—have been dropped.As a result, the content changes have had a neutral impact on response burden. In summary, the manufacturingquestionnaire has profited considerably from a vigorous process of discussion, evaluation and debate which hasencompassed both content and format issues and has involved worthwhile input from all affected parties—includingrespondents.

Page 25: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1658

DISCUSSION: ALTERNATE RESPONSE MODES

Michael P. Cohen, U.S. National Center for Education Statistics1

1990 K Street NW, 9th Floor, Washington DC 20006-1103 [email protected]

ABSTRACT

The papers of the session on “Alternate Response Modes” are discussed. These papers provide for establishment surveyscareful treatments of data quality concerns involving the survey instrument and reporting.

Key Words: Data Quality, Reporting, Survey Instrument

1. INTRODUCTION The unifying thread of these interesting papers is a serious concern for data quality with respect to the surveyinstrument and reporting. The session title might have been “Respondent Meets Survey Instrument.” As a statistician at a relatively small statistical agency, the U.S. National Center for Education Statistics (NCES), Ihave involvement with the matters dealt with in this session but am not a specialist in them. I shall be discussing thepapers from this perspective. My remarks are certainly not comprehensive. I just bring up points where I havequestions or particular comments. 2. SANER AND PRESSLEY The paper by Lelyn D. Saner and Kimberly D. Pressley of the U.S. Bureau of the Census, “Assessing the Usability ofan Electronic Survey Instrument in a Natural Use Context,” is based on work for the U.S. Annual Survey ofManufactures (ASM). I liked the scholarly treatment of usability and recommend the paper to anyone doing researchin this area. Computerized Self-Administered Questionnaires (CSAQs) are becoming increasingly important as means of datacollection and, I predict, this trend will continue. The ASM uses a CSAQ. NCES-sponsored data collections thatmake use of them include ones for libraries and postsecondary education institutions. Surveys using CSAQs can incorporate information edits in which the user is alerted when the response to an item issuspicious or invalid. In the ASM CSAQ, the user can click on an icon for more information. As the authors note,this facility should have a positive effect on data quality. I would like to see research, though, on the effects, if any,of the information edits on item and unit nonresponse. My concern is that the user may become frustrated. This mayespecially be a problem in the case of “hard edits” in which the user is not allowed to submit a completed form untila discrepancy is resolved. A promising new area for research is the analysis of event logs: “…as the user goes through the process ofcompleting the survey, every time he or she clicks a button, makes a menu selection, or types something in using thekeyboard, the action is recorded as an ‘event.’ ” With such a high level of detail, there is an exciting potential forquantum leaps in our understanding of users’ behaviors. An objective of the Saner and Pressley research is to compare a window-by-window “item based” format for theCSAQ to a single scrollable screen or “form-based” format. I am curious; which one is better? 1 This discussion is intended to promote the exchange of ideas among researchers. The views are those of the author,and no official support by the U.S. Department of Education is intended or should be inferred.

Page 26: ASSESSING THE USABILITY OF AN ELECTRONIC … · 2.0 USABILITY AND USER-CENTERED DESIGN The system is the mechanism that has been constructed by people to perform tasks specified by

1659

3. CLARRIDGE AND SCHARF This work by Brian Clarridge and Lauri Scharf of the University of Massachusetts, Boston, Center for SurveyResearch, “Assessing the Usability of an Electronic Survey Instrument in a Natural Use Context,” pertains to a studyof primary medical care sites (physicians’ offices, clinics, health centers) that serve Medicaid patients inMassachusetts. The authors obtained an alternative source of data on a subset of the sites (a directory supplied bythe managed care organization Network Health). They make use of this and follow-up telephone calls as needed todo a careful and thoughtful data quality analysis. One of their particular concerns is proxy reporting, that is,reporting by an individual at a site on the work of others. In reconciling the information from the two data sources, apparent discrepancies may be due to differences in thewording of similar questions. It would therefore be helpful to see the exact wording of the questions. In the study, interviews were completed at 71% of the sites, obtaining information on 96% of the primary carephysicians. The 96% is amazingly good, and the 71% is very respectable for this kind of survey. As a surveymethodologist, though, I cannot help wanting to know more about the other 29% of the sites. I suspect that they arefor the most part small operations. 4. STETTLER, MORRISON, AND ANDERSON The paper by Kristin Stettler, Rebecca Morrison, and Amy E. Anderson of the U.S. Bureau of the Census, “Resultsof Cognitive Interviews Studying Alternative Formats for Economic Census Forms,” applies cognitive techniques toimproving the forms for the manufacturing sector of the U.S. Economic Census. A fascinating problem that bears on the analysis is first-line bias. This bias arises because respondents put aggregatetotals on the line intended for the first detailed product. At least for some respondents, “… the level of detailrequested would be difficult to provide accurately. Most felt that they could provide reasonable estimates.” Thechallenge is to coax the respondent into estimating the detail based on accurate information at higher levels ofaggregation. Problems like this (but not exactly the same) arise in surveys of postsecondary education institutionsand other places. Why not have a line on the form for the aggregate numbers too? I think this is worthy of investigation. This study provides an excellent illustration of the value of cognitive interviews in the establishment survey setting. 5. CRYSDALE This study by John S. Crysdale of Statistics Canada, “Harmonizing Survey Content: Integrating Canada’s AnnualSurvey of Manufactures into the Unified Enterprise Survey,” is thoughtful and well designed. It systematicallydescribes the many considerations that go into integrating a particular survey, in this case Canada’s Annual Survey ofManufactures, into a unified survey system. One interesting and important consideration was whether to retain personalization (also called customization) of theforms. In personalization, the forms are modified at the establishment level to reflect previous period reporting.This, of course, is very helpful to the respondent. A concern, though, as noted by Crysdale, is that “… somecommodities were not being reported because respondents were not being asked about them.” This definitely is aworthy topic for further research. 6. CONCLUSION

This session provides a strong indication of the progress being made in designing survey instruments forestablishment surveys and on related issues. It is fascinating to think about what might be in a similar session atICES-III.


Top Related