interactive retrieval using simulated versus real work task situations: differences in sub-facets of...

10

Click here to load reader

Upload: die

Post on 02-Apr-2017

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interactive retrieval using simulated versus real work task situations: Differences in sub-facets of tasks and interaction performance

Interactive Retrieval Using Simulated versus Real Work Task Situations: Differences in Sub-facets of Tasks and

Interaction Performance Yuelin Li

Department of Information Resources Management, Business School, Nankai University,

Tianjin China, 300071 [email protected]

Die Hu Department of Information Resources

Management, Business School, Nankai University, Tianjin China, 300071

[email protected]

ABSTRACT Task design in interactive information retrieval (IIR) research, including IIR systems evaluation and IIR behavior research, is critical. Currently, a simple search request and a simulated work task situation design are used most widely, especially the latter. This paper attempts to examine the possible differences between simulated and real work task situations in a digital library evaluation, with respect to task characteristics, the influence of the tasks on users’ interactive information search behavior, and interaction performance. An experiment was conducted. One simulated work task situation was assigned to forty-two participants and they were also asked to bring one real work task to the experiment. A set of questionnaires, including an entry questionnaire, a pre- and a post-search questionnaire were administered to the participants. An exit interview was conducted before the experiment ended. The results indicate that simulated and real work task situations are significantly different in some sub-facets of task, but not all sub-facets examined in this study. However, the influence of the two tasks on users’ interactive information search behavior and interaction performance is not significant, though the sub-facets of the real and simulated work task situation are correlated to interaction performance differently. Therefore, simulated work task situations could be a supplement of real work task situations in IIR evaluation if well designed. Also, the study found that for some participants, they assessed the simulated one in fact easier, but the real one in fact harder after the search. Moreover, it should be taken into account to control some critical sub-facets of task when designing simulated work task situations. The paper concludes with suggestions on designing simulated work

situations in IIR research and future studies.

Keywords Simulated work task situations, real work task situations, interactive information retrieval, information systems evaluation.

INTRODUCTION IR evaluation has been a long term concern in information science. It plays an important role in the improvement of IR systems. So far, many studies have been done to evaluate different types of IR systems, including traditional IR systems, digital libraries, search engines, and so forth. To make effective IR evaluation, researchers proposed different IR evaluation models that have been applied to IR evaluation practices, such as the system-oriented Cranfield model, user-centered IR evaluation models, and some models combined both approaches, for example, the IIR model proposed by Borlund and Ingwersen (1997) and Borlund (2000). For different models, task design in IR evaluation is critical and affects final results of IIR research to a great extent.

Before the concept of simulated work task situations was proposed, a description of search request was prevalent in IR evaluation. A simple search request in fact ignores the context of information search that motivates users-IR systems interaction. However, context could not be neglected since it is such an influential factor that shapes users’ information-seeking behavior (Courtright, 2007).

To improve IR evaluation, simulated work task situations are designed as close as possible to the real situations facing the users, and thus provide contexts that elicit users’ interaction with an IR system. Since the concept was proposed, it has been widely used in IR evaluation and the research on information searching behavior (Borlund, 2010). To verify this approach, some researchers examined the concept in their evaluation, for example, Blomgren, Vallo, and Byström (2004). They found that the users performed better when searching for their real situation. However, according to Borlund and Schneider (2010), the

Copyright is held by the author/owner(s). ASIST 2013, November 1-6, 2013, Montreal, Quebec, Canada.

Page 2: Interactive retrieval using simulated versus real work task situations: Differences in sub-facets of tasks and interaction performance

validation study is not enough and further validation of this construct is still called for. Therefore, this study examines the differences of simulated and real work task situations when they are both used in a study on digital library evaluation. Because this study focuses on the differences between simulated and real work task situations, it will not report the results of the DL evaluation.

LITERATURE REVIEW

Tasks and information-seeking and search

Tasks have been examined in various studies as it influences users’ information-searching and -seeking behavior (Vakkari, 2003; Kim, 2009; Li and Belkin, 2010). Different task characteristics have been investigated, such as task complexity, task types, task stages, task product, task interdependence, and so on.

According to Byström and Järvelin (1995), task complexity shaped users’ information-seeking behavior to a great extent. Bell and Ruthven (2004) found that the participants could differentiate the levels of complexity. Several factors, including useful information provided by the tasks, type of information, and the amount of information, affected task complexity. Gwizdka and Spence (2006) found that higher search effort, lower navigational speed, and lower search efficiency could be good predictors of subjectively perceived post-task difficulty. Objective task complexity affected user’s subjective judgment of task difficulty. Li and Belkin (2010) found that objective work task complexity affected almost all aspects of interactive information search behavior investigated. These studies illustrate that task complexity is an important factor that shapes users’ interaction with information systems, and at the same time, it is also influenced by different factors.

In addition to task complexity, several other task facets or characteristics have been examined. Li and Belkin (2008) reviewed different studies on task in information science, social psychology, and organizational management and developed a faceted classification of task. This framework presents different facets, sub-facets, and values of task. The facet ‘user’s perception of task’ involves several sub-facets that reflect task attributes from users’ perspective. The classification well informs empirical studies in task-related information seeking and search (e.g. Li, 2009; Li and Belkin, 2010). Li (2009) and Li and Belkin (2010) suggested that task should be a multi-faceted variable and considered its different facets when examining its influence on information search and retrieval. Liu and Belkin (2010) examined task stages and task types, based on which to improve personalization of IR. Task type affects the relationships between task difficulty and user behavior and should be taken into account for predicting task difficulty (Liu, Gwizdka, Liu, and Belkin, 2010). Xie (2009) identified different dimensions of work tasks, such as nature of task, stages of task, and timeframe of task, and also the dimensions of search task, such as origination of

the task, types of the task, and flexibility of the task from the data. She found that both dimensions of work tasks and search tasks affected what the participants planned and how they planned a search. Li (2012) also found that users’ perception of different sub-facets of task shaped their interaction with IR systems to different degrees.

Task Design and Simulated Work Task Situations

Considering the effect of task on users’ information behavior, it is critical to design appropriate tasks for effective IIR systems evaluation and behavior research. As mentioned above, a traditional way is to design a simple search request, which is still employed by TREC experiment. With the development of user-centered information science, users play a more and more important role in IIR evaluation and are incorporated into evaluation design.

To facilitate IIR research, Borlund and Ingwersen (1997) and Borlund (2000) proposed an IIR evaluation model, which is composed of three parts: a set of components that ensures a functional, valid, and realistic setting for the evaluation of IIR systems; empirically based recommendations for the application of the concept of simulated work task situations; a call for alternative performance measures of binary relevance assessment. To ensure the effectiveness of the model, Borlund (2000) specifically examined simulated work task situations and pointed out that a good simulated work task situation should reflect three main characteristics:

• The situation has to be one to which the test persons can relate and with which they can identify;

• The topic of the situation has to be of interest to the group of test persons; and

• The situation has to provide enough imaginative context in order for the test persons to be able to apply the situation.

(Borlund, 2000, p.86)

Therefore, the simulated work task situations should be carefully designed based on the consideration of the participants’ background.

Since simulated work task situations were proposed, a lot of studies have used it to conduct IIR research (Borlund, 2010). Blomgren, Vallo, and Byström (2004) evaluated an IR system by testing a real and simulated work task situation respectively. Kim (2009) examined how task could be a predictor of users’ information search behavior on the web. For that, she identified three task types, such as factual task, interpretive task, and exploratory task, and designed three simulated work task situations corresponding to each task type. The study found that frequencies and patterns of information-seeking strategies are significantly different in terms of different task types. By using simulated work task situations, Yuan and Belkin

Page 3: Interactive retrieval using simulated versus real work task situations: Differences in sub-facets of tasks and interaction performance

(2010a) evaluated four IIR systems that support specific information-seeking strategies. The systems were constructed based on the combination of different IR techniques. Also based on simulated work task situations, they conducted an evaluation of an integrated system that supports multiple information-seeking strategies (Yuan and Belkin, 2010b).

To make simulated work task situations valid and effective, Li and Belkin (2010) adopted the faceted classification of task (Li and Belkin, 2008) to develop simulated work task situations. Based on a pre-study by semi-structured in-depth interviews to identify real work tasks of targeted user group (Li, 2009), and considering possible influence of different facets of task on users’ interaction with IIR systems, they controlled some facets of task that were not found significant in shaping users’ search tasks (Li, 2009), and varied objective task complexity and task product to construct simulated work task situations. This way ensures that the simulated work task situations are greatly close to real ones and could meet the Borlund’s instructions (2000). However, though the experiment is effective in terms of the research goal, the process to design simulated work task situations is time-consuming. It is impossible for each study to design simulated one in this way.

Though many studies designed simulated work task situations, Borlund and Schneider (2010) pointed out that the recommendations of constructing simulated work task situations were not exactly followed. This may lead to bias of research results. Moreover, except the three recommendations proposed by Borlund (2000), are there any other suggestions to the design of simulated work task situations? It is still necessary to examine.

In addition to designing tasks via simulated work task situations, some other approaches have been explored. For example, Kula and Capra (2008) realized that designing well-grounded and realistic exploratory search tasks was a challenge for study design. To construct exploratory tasks for IIR evaluation, they proposed a two-step approach by examining log data and developing a task template. They first extracted topics from log data, and then plugged the topics into a task template. Based on these two steps, they created candidate tasks. Next, they conducted a set of searches to refine the tasks and made sure that these tasks were appropriate for the study. The purpose of this method is to identify users’ real exploratory search tasks for IIR evaluation.

To sum up, literature review indicates that different task characteristics affect users’ information search behavior; simulated work task situations have been widely used but need more verification. Compared to other techniques, simulated work task situations have strengths in terms that they take into account contextual influence and thus support users’ interaction with IR systems. However, it is still necessary to investigate how to design simulated work task

situations and make them as close as possible to real work tasks of targeted user group.

RESEARCH QUESTIONS

Previous studies indicate that when conducting IR evaluation and research on information seeking or search behavior, researchers should consider different facets of task. Thus, to examine the differences of the two types of work tasks, this study examines different sub-facets of them in a DL evaluation study. Also, to articulate the differences made by simulated and real work task situations, this study explores users’ interactive behavior and their perception of search performance, that is, interaction performance. Specifically, this study investigates the following research questions:

Q1: What are the differences between simulated and real work task situations in terms of users’ perception of sub-facets of task?

Q2: Do simulated and real work task situations make differences in users’ interactive information search behavior?

Q3: What are the differences between users’ interaction performance when they search for simulated and real work task situations?

In this way, the study addresses how to design simulated work task situations in IR evaluation and information behavior research.

RESEARCH METHOD

An experiment was conducted to evaluate the performance of digital libraries (DL) in China. To this end, the study selected CNKI as the experimental system. CNKI (Chinese National Knowledge Infrastructure) is one of the most famous and widely used DL in China. It covers most of academic journals in various disciplines. To some extent, it could represent the achievement of the development of digital libraries in China.

Experimental Design

The experiment was conducted in real settings where the students use digital libraries, that is, dorms or classrooms. A laptop with a software tool, called Video Screens Experts (V7.5), was used to record users’ interaction with the DL. To limit the influence of different educational levels on users’ information search behavior, we recruit participants only from undergraduate students. Totally, 42 participants were recruited through recruitment notices distributed to dorms, cafeterias, classrooms, and so on.

The study designed one simulated situation, and the other is a real work task situation the participants need to complete recently or are conducting recently. The simulated one is described as follows:

Page 4: Interactive retrieval using simulated versus real work task situations: Differences in sub-facets of tasks and interaction performance

Simulated work task situations: You can imagine you are a representative to the national people’s congress. The congress will meet soon and you are interested in the reformation of value-added tax and would like to finish a proposal on that. To the end, you need to learn the research on it and experts’ different viewpoints, and attempt to incorporate them into your proposal.

Search task: Try your best to search CNKI and save useful search results. You will have 15 minutes to search.

At the time preparing for the experiment, the congress of China will meet soon and the value-added tax is a hot topic. People discussed it in different kinds of online or offline media. The students are also interested in this issue. The simulated work task situation was drafted based on an informal interview of students on topics they were interested in.

The experiment also asks the participants to conduct a real search for their recent work. The real situation is described as follows:

Real work task situation: Please select a task you need to finish recently, such as a paper, an assignment, or a research project, and similar others. You need to search information in order to finish the task.

Search task: Try your best to search CNKI and save useful search results. You will have 15 minutes to search.

The two tasks were rotated in the experiment to avoid learning affects. To collect data, we developed a set of instrument, including a consent form, an entry questionnaire, a pre-search questionnaire, a post-search questionnaire, and an evaluation questionnaire. After the participants finished the search for the two tasks, an exit interview about the search process and the evaluation of CNKI was conducted.

The experiment was conducted in April 2012 and each one lasted one hour in average. Again, considering the purpose of this study, the evaluation of CNKI will not be reported here.

Data Collection and Analysis

This study examined more sub-facets of task than those identified in the faceted classification of task (Li and Belkin, 2008) in order to more comprehensive compare simulated and real work task situations used in the same study on IR evaluation. Before the search, the participants were asked to finish a pre-search questionnaire. Based on 7-point Likert scales, the participants made judgment on task complexity (subjective task complexity), topic familiarity, search experience with similar tasks, confidence to locate useful information for the task, task difficulty, task goal clarification, task urgency, and knowledge on the method to complete the task. In this study, these aspects are named “pre-search task characteristics”.

After the search, a post-search questionnaire was administered to the participants and asked them to judge the difficulty of relevance judgment, efforts need to locate useful information (measured by “whether need to browse a lot of documents before locating useful one”), demanding skills of thinking and problem-solving, sufficiency of information gathered for task completion, also based on 7-point Likert scales. This study named these aspects “post-search task characteristics”. These characteristics are related to task complexity (Maynard and Hakel, 1997).

The perception of task difficulty was found to change before and after searching (Liu, Liu, Yuan, and Belkin, 2011). For more precisely measuring users’ perception of task complexity or difficulty, the study set a measure before and after searching to measure task complexity and difficulty, respectively. Rather using the same term as “task complexity” in the pre-search questionnaire, we used a set of items (post-search task characteristics) to measure task complexity. For task difficulty, before the search the participants were required to judge task difficulty; they were asked to assess their judgment of task difficulty was “precise”, “harder than pre-judgment”, or “easier than pre-judgment” after the search.

The experiment was recorded and examined to extract related data. SPSS 19.0 was used to analyze the data. One-way ANOVA and T-test were performed to examine the differences of the simulated and real work task situation in users’ perception of different sub-facet of task, their influence on interactive information search behavior, and users’ interaction performance.

Characteristics of Participants

EX: experience; FA: familiarity; FR: frequency; SS: search skills; SP: search performance

Table 1. Characteristics of Participants

Among the participants, there are 33.3% male and 66.6% female from 17 majors. 50% of the participants are from LIS; 54.8% of them accepted professional information

EX Fair Experienced Much

6(14.3%) 29 (69%)

7 (16.7%)

FA Not familiar (1,2,3) Fair(4) Familiar

(5,6,7)

10 (23.8%)

23 (54.8%) 9(21.4%)

FR Low Fair High

5 (11.9%)

27 (64.3%)

10 (23.8%)

SS Low (1,2,3)

Fair (4)

High (5,6,7)

18 (42.9%)

16 (38.1%)

8 (19%)

SP Not successful (1,2,3)

Fair (4)

Successful (5,6,7)

3 (7.1%)

8 (19%)

31 (73.9%)

Page 5: Interactive retrieval using simulated versus real work task situations: Differences in sub-facets of tasks and interaction performance

search training, for example, taking related courses or participating in related seminars.

In the entry questionnaire, the participant was required to provide their search experience in CNKI (EX), degree of familiarity with CNKI (FA), frequency of using CNKI (FR), self-assessed search skills (SS) and performance in CNKI (SP). Based on the years they search online, the participants could be categorized into fair experience (less than one year), experienced (more than one year but less than three years), and much experience (more than three years). Based on a 7-point Likert Scale, the participants assessed their search skills and performance, respectively. In terms of frequency of using CNKI, the participants could be categorized into three groups, such as low (less than one time a week), fair (one or two times a week), and high (more than three times a week).

Table 1 shows that most of participants are experienced in using CNKI and fair in terms of familiarity with it and frequency of using it. Most of participants assessed themselves fair or low in search skills. However, most of them assessed their search performance successful. This seems conflict.

RESULTS

This section reports participants’ perception of the simulated and real work task situation in terms of different sub-facets of task, compares the participants’ interactive information search behavior resulting from different work task situations, and also examines possible differences in users’ search performance.

Task Characteristics

#: “1” means clear, “7” means extremely vague

*: p<.05

Table 2. Task characteristics (1)

One-way ANOVA indicated that the simulated situation and the real one were significantly different in topic familiarity, search experience, participants’ confidence, task difficulty, and task goal clarification. Table 2 shows the means and standard deviations of participants’ rating with real and simulated tasks, as well as the ANOVA test values.

The results indicate that the participants were significantly more familiar and experienced with real work task. They were more confident to locate useful information for completing the real situation than the simulated one. Compared to the simulated work task situation, the goal of the real one is clearer for the participants.

The real work tasks are the tasks the participants need to finish recently, such as writing a paper, finishing an assignment, or conducting a research project. The topics are related to their major and courses. This suggests that the participants intend to perceive intimate or right at-hand tasks more familiar, more experienced, more confident, less difficult, and clearer in terms of task goals.

Table 3 indicates that the simulated and real situation have no significant difference in some sub-facets of task, such as task complexity, task urgency, difficulty in relevance judgment, demanding skills of thinking and problem-solving, effort to locate useful information, and sufficiency of information gathered, though there is a trend to be significant difference in knowledge of task procedure.

Table 3. Task characteristics (2)

Though there is no significant difference found among these sub-facets, it is obvious that the real task is easier and more urgent than the simulated one. It is more difficult to make relevance judgment, need more skills of thinking and problem-solving, and more efforts to locate useful information for the simulated situation.

Compared Table 2 with Table 3, the significant differences of sub-facets of task between the two situations were found in terms of pre-search task characteristics, while no any

Real ( )N=42

Simulated( )N=42

F

Pre- Search

Topic Famili-arity

4.21 ( )1.69

2.38 ( )1.378

F(1,82)=29.69*

Search expe-rience

3.98 ( )1.423

2.45 ( )1.329

F(1,82)=25.73*

Confi-dence

4.76 ( )1.265

3.88 ( )1.310

F(1,82)=9.83*

Task difficulty

3.69 ( )1.199

4.62 ( )1.248

F(1,82)=-12.08*

Goal clarifi-cation#

2.81 ( )1.383

3.60 ( )1.712

F(1,82)=-5.35*

Time data collected Real

(N=42) Simulated (N=42)

Pre-search Task complexity

3.95 (1.378)

4.50 (1.366)

Knowledge of task procedure

4.52 (1.215)

3.95 (1.513)

Urgency 4.48 (1.756)

3.88 (1.517)

Post-search Difficulty in relevance judgment

3.02 (1.388)

3.52 (1.55)

Demanding skills of thinking and problem-solving

4.88 (1.485)

5.17 (1.513)

Efforts to locate useful information

4.45 (1.485)

4.69 (1.585)

Sufficiency of information gathered

4.14 (1.539)

4.14 (1.475)

Page 6: Interactive retrieval using simulated versus real work task situations: Differences in sub-facets of tasks and interaction performance

significant differences were found with regard to post-search task characteristics. This may indicate that the difference between the simulated and real work task may only exist before the search. At that time, the participants in fact engaged with the simulated work task situation at the first time, but for the real one, that may not.

The results also indicate that to the real situation, 35.7% of the participants could correctly predict task difficulty before the search, but 59.3% of them regarded that the task in fact was harder than their prediction; only 4.8% of them regarded that it was easier than their prediction. In terms of the simulated one, the percentage is 31%, 40.5%, and 28.6%, respectively. Higher percentage of participants perceived that the real situation was harder after than before searching; higher percentage of participants assessed that the simulated situation in fact was easier after than before searching.

Interactive Information Search Behavior

The study measures users’ interactive information search behavior via the number of queries issued, search fields used, unique search queries, result pages viewed, documents downloaded, and the length of queries. The data was extracted from the recordings. Because one recording

Measures of Interactive behavior Types of task Mean (SD) Number of

queries issued Simulated (N=41) 4.66

(4.39) Real

(N=41) 5.44

(4.68) Number of

search fields used Simulated (N=41) 2.07

(1.08) Real

(N=41) 1.98 (.91)

Mean length of search queries

Simulated (N=41) 6.61 (3.30)

Real (N=41)

6.57 (2.97)

Number of unique search

queries

Simulated (N=41) 3.85 (3.00)

Real (N=41)

3.68 (2.82)

Number of pages of search results viewed

Simulated (N=41) 10.22 (8.90)

Real (N=41)

9.49 (7.93)

Number of documents downloaded

Simulated (N=41) 8.85 (5.81)

Real (N=41) 8.24 (6.85)

Table 4. Interactive behavior in terms of the simulated and real situation

of the experiment (P13) was damaged, the data was dropped. Therefore, 41 participants’ data were extracted and analyzed. T tests indicate that in terms of these measures, there are no significant differences found between the simulated and the real one. Table 4 shows the means and standard deviations.

Interaction performance

Interaction performance was measured based on participants’ perception of search success, frustration and satisfaction. For that, they were asked to judge the degree of success of the search, frustrated experience, and satisfaction with the search process. Table 5 shows the result.

Interaction performance

Real (N=42)

Simulated (N=42)

Success 4.67 (1.588)

4.26 (1.326)

Frustration# 4.31 (1.732)

3.86 (1.441)

Satisfaction 4.64 (1.445)

4.60 (1.083)

#: “1” means completely disagree with “No frustration”; “7” means completely agree “No frustration”

Table 5. Interaction performance in terms of the real and simulated situation

With respect to interaction performance, the two tasks did not lead to any significant differences in this study, though the participants perceived more successful and less frustrated with the search for the real work task situation. In terms of satisfaction, the participants perceived almost the same. Therefore, in this study, whether the task is a simulated or real one does not affect users’ perception of their interaction performance.

To further examine the differences between the simulated and the real work task situation, we performed Pearson correlation tests to see whether there is significant correlation between different sub-facets of task and user’s interaction performance. Table 6 and Table 7 show the significant correlations in terms of the real and simulated situations before and after the search, respectively.

Table 6 indicates that success is significantly correlated to participants’ confidence, task difficulty, and task complexity before the participants searched for the real task. Frustration is significantly correlated to topic familiarity, search experience, and knowledge of task procedure, while satisfaction is only significantly correlated to confidence and knowledge of task procedure. For the simulated task, users’ satisfaction is significantly correlated to task difficulty and knowledge of task procedure. Therefore, to the real task, confidence and knowledge of task procedure seem more powerful than other sub-facets in affecting participants’ interaction performance because they are significantly correlated to more aspects of interaction performance.

Page 7: Interactive retrieval using simulated versus real work task situations: Differences in sub-facets of tasks and interaction performance

Facet of

task

Success Frustra-

tion

Satis-

faction

Real (N=42)

Topic Familiarity

.318*

Search experience

.389*

Confidence .385* .313*

Task difficulty

-.376*

Task complexity

-.531**

Knowledge of task procedure

.396** .387*

Simu-lated (N=42)

Task difficulty

-.315*

Knowledge of task procedure

.330*

*p<.05; **p<.01

Table 6. Correlation coefficient between pre-search task characteristics and interaction performance

*p<.05; **p<.01

Table 7. Correlation between post-search task characteristics and interaction performance

Table 7 shows the correlation between users’ perception of the tasks after the search and their interaction performance

with respect to the real and simulated work task situation. The results indicate that success is significantly correlated to difficulty in relevance judgment and sufficiency of information gathered in terms of both tasks. To the simulated one, satisfaction is also significantly correlated to these two sub-facets. No sub-facets of task are found significantly correlated to frustration.

Based on the data, to the real work task situation, more task characteristics are found significantly correlated to users’ interaction performance. Moreover, compared to the real work task situation, users’ interaction performance against the simulated one seems more related to post-search task sub-facets since difficulty in relevance judgment and sufficiency of information gathered are significantly correlated to more aspects of interaction performance, and less related to pre-search task sub-facets. This indicates that when designing simulated work task situations, the influence of sub-facets of task should be considered since for real work task situations, the correlation between sub-facets of task and performance is recognized.

DISCUSSION

Via examining Q1, Q2, and Q3, this study investigates the differences between simulated and a real work task situations when conducting a DL evaluation. With regard to Q1, there are significant differences in terms of some sub-facets of task between the simulated and real work task situation. However, there is no significant difference found in their influence on either users’ interactive behavior or interaction performance. That means that a simulated work task could work as well as real work task situations in IIR evaluation and the bias may not be significant. Therefore, the results support Borlund & Ingwersen (1997) and Borlund (2000). Well-designed simulated work task situations could effectively take the place of real work tasks when conducting IIR evaluation. In terms of the evaluation of CNKI, the task design for the experiment is effective and the two tasks could be viewed as homogenous, rather than heterogeneous tasks in nature.

Though different studies indicate that facets or dimensions of task shape users’ information-seeking behavior (Xie, 2009; Li, 2012), this study indicates that not all facets of task could make significant difference. With regard to Q2 and Q3, significant differences between the simulated and real work task situation in task topic familiarity, search experience, confidence, task difficulty, and goal clarification did not make users’ interactive information search behavior and interaction performance significantly different, though the findings are conflict with previous studies, especially in terms of task difficulty. However, the results support the view that different sub-facets of task shape users’ interaction performance to different degrees (Li, 2010). More studies with appropriate sample size and robust research design are necessary to articulate this issue.

Data

Colle-

cted

Facet of task Success Frustra-

tion

Satis-

faction

Real (N=42)

Difficulty in relevance judgment

-.450**

Sufficiency of information gathered

.539**

Simu-lated (N=42)

Difficulty in relevance judgment

-.472** -.321*

Sufficiency of information gathered

.542** .373*

Page 8: Interactive retrieval using simulated versus real work task situations: Differences in sub-facets of tasks and interaction performance

As mentioned above, some studies, for example, Byström and Järvelin (1995), Li and Belkin (2010), and so on, indicate that task complexity is a critical factor that greatly shapes users’ interactive information search or seeking behavior. Interestingly, the result indicates that in terms of task complexity, the simulated and real work task situation in this study are not significantly different. Meanwhile, in this study, there are no significant differences found in terms of users’ interactive information search behavior and their interaction performance. Thinking of research findings in previous studies (Byström and Järvelin, 1995; Bell and Ruthven, 2004; Li and Belkin, 2010), task complexity may play a great role here. It is possible that non-significant difference between users’ interactive behavior and interaction performance may be because of non-significant complexity level of both tasks. This suggests that controlling some critical sub-facets of task in designing simulated work task situations for IIR research may be necessary in order to minimize the gap between the simulated and real work task situations, for example, task complexity. Since task complexity has been found to affect users’ information search behavior in different studies, the degree of task complexity of simulated work task situations should correspond to that of real ones. For example, the low, medium, and high complexity of simulated work task situations should not significantly different from the low, medium, and high complexity of real work task situations, respectively. In fact, Bell and Ruthven (2004) designed different levels of complex tasks in their study, but they did not examine whether different complexity levels could correspond to those of the participants’ real work tasks. By controlling some critical sub-facets of task and also considering the characteristics identified by Borlund (2000), simulated work task situations could be more effective in IIR evaluation and behavior research.

Task difficulty is a widely examined factor in IIR research (Liu, Liu, Yuan, and Belkin, 2011). Some studies use it and task complexity interchangeably, while some advocate that they are different constructs (Li and Belkin, 2010) and should be treated distinctively. This study supports the latter: task difficulty was found significantly different in terms of the simulated and real work task situation, but task complexity was not; in terms of the simulated work task situation, task difficulty was significant correlated to satisfaction, but task complexity was not. Also, based on the post-search data, a great percentage of participants judged task difficulty differently before and after the search. However, in terms of task complexity, the participants’ judgment was not significantly different before and after the search (measured by the post-search characteristics). This may mean that task complexity is a more stable construct than task difficulty. However, more investigation is necessary to examine these two important constructs, including how to define and measure them. This study also suggests that pre- and post-search data should be carefully used in IIR research, especially when

measuring participants’ perception. In this study, the participants measured task difficulty differently before and after the search. The results suggest that users intend to perceive a task more intimate to themselves or right at their hand easier, and otherwise harder before the search. That is, their perception may change after the search. This supports the findings of Liu, Liu, Yuan, and Belkin (2011). On one hand, it is possible that the perceptions of participants on task difficulty are vague before the search. If they are familiar with the task, it would be judged as an easier one. Otherwise, it would be a harder one. However, after they search for that task and learn more about it, their perception will become clearer and the judgment will be more precise. Under this circumstance, the data from post-search will be closer to participants’ real cognitive state. On the other hand, the participants may correctly judge task difficulty before the search based on their cognitive state at that moment. However, searching is essentially a learning process. After the search, the participants gain new knowledge that helps reduce task difficulty. From this angle, the instrumental design in IIR research should also monitor the change of cognitive state of the participants during the experiment. In this way, the data could describe the participants’ cognitive state more accurately, and thus researchers could use it for different research goals better. CONCLUSIONS

This study examines the differences between simulated and real work task situations using in IIR research and attempts to see whether simulated work task situations could work as well as real ones for the evaluation of digital libraries. An experiment was conducted to evaluate a digital library in China, i.e., CNKI. A simulated work task situation was designed based on an informal interview and assigned to the participants; they were also asked to search for a real work task. With regard to different sub-facets of task, data was collected before and after the search. The results indicate that there are significant differences in terms of some pre-search task characteristics. However, there is no significant difference found between the two task situations in terms of post-search task characteristics, users’ interaction performance and interactive information search behavior. This indicates that the simulated work task situation and the real one in fact do not significantly impact users’ interaction with the digital library.

Thus, this study supports the effective use of simulated work task situations in IIR research. That is, designing simulated work tasks should follow the instructions proposed by Borlund (2000). Also, a clear understanding of real work tasks of targeted user group and appropriate control of critical sub-facets of task should also be taken into account when developing simulated work task situations. IIR researchers should also carefully design instruments in order to collect data that could accurately describe users’ cognitive state. Moreover, it is necessary to use appropriate data in light of research purposes.

Page 9: Interactive retrieval using simulated versus real work task situations: Differences in sub-facets of tasks and interaction performance

However, this study has several limitations. Firstly, since only one situation was designed corresponding to simulated and real work task situations, the research results may be biased. Secondly, the participants were undergraduate students. Undergraduate students might use CNKI less frequently than graduate students. The current study does not consider participants’ use experience with CNKI, which may influence their behavior and performance.

Future studies will continue to evaluate digital libraries in China, especially to evaluate interaction design. For that, a refined research design will be considered based on this study. For example, the study will recruit participants from different levels, such as undergraduate, master, and doctoral students who frequently use digital libraries with different purposes. In terms of task design, more simulated work task situations based on real work tasks of different levels of students will be developed and different levels of task complexity corresponding to real work tasks should be taken into account. For that, a pre-study to identify different levels of real work task complexity is also necessary.

ACKNOWLEDGMENTS

The research is sponsored by National Social Science Foundation of China (Grant No. 11BTQ009). Our thanks go to Xiaoyun Tong for data collection and the participants of this study. Also, we appreciate the reviewers for their very helpful comments.

REFERENCES

Belkin, N.J., Marchetti, P.G., & Cool, C. (1993). BRAQUE: Design of an interface to support user interaction in information retrieval. Information Processing & Management, 29(3), 325–344.

Bell, D.,&Ruthven, I. (2004). Searcher’s assessments of task complexity for web searching. In S. McDonald & J. Tait (Eds.), European Conference on Information Retrieval (ECIR 2004). Lecture Notes in Computer Science, Vol. 2997, 57–71.

Blomgren, L.,Vallo, H.,&Byström, K. (2004). Evaluation of an information system in an information seeking process. Lecture Notes in Computer Science (Vol. 3232, pp. 57–68). London: Springer.

Borlund, P. (2000). Experimental components for the evaluation of interactive information retrieval systems. Journal of Documentation, 56(1), 71–79.

Borlund, P., & Ingwersen, P. (1997). The development of a method for the evaluation of interactive information retrieval systems. Journal of Documentation, 53(3), 225–250.

Borlund, P., & Schneider, J. W. (2010). Reconsideration of the simulated work task situation: A context instrument for evaluation of information retrieval interaction. Proceedings of IIiX 2010, August 18–21, 2010, New Brunswick, New Jersey, USA.

Byström, K., & Järvelin, K. (1995). Task complexity affects information seeking and use. Information Processing & Management, 31, 191–213.

Courtright, C. (2007). Context in information behavior research. Annual Review of Information Science and Technology, 41(1), 273-206.

Gwizdka, J., & Spence, I. (2006). What can searching behavior tell us about the difficulty of information tasks? A study of web navigation. In A. Grove (Ed.), Proceedings of the 69th annual meeting of the American Society for Information Science and Technology (Vol. 43). Retrieved April 26, 2010, from http://comminfo.rutgers.edu/∼jacekg/publications/fulltext/ASIST2006_paper_final.pdf.

Kim, J. (2009). Describing and predicting information-seeking behavior on the Web. Journal of the American Society for Information Science and Technology, 60(4), 679–693.

Li, Y. (2010). An exploration of the relationships between work tasks and users’ interaction performance. Proceedings of Annual Meeting of the American Society for Information Science and Technology. October 22-27, 2010, Pittsburgh, PA, USA,Retrieved Sep. 16, 2011, from http://onlinelibrary.wiley.com/doi/10.1002/meet.14504701127/pdf.

Li, Y. (2012). Investigating the relationships between facets of work task and selection and query-related behavior. Chinese Journal of Library and Information Science, 5(1), 51-69.

Li, Y., & Belkin, N. J. (2010). An exploration of the relationships between work task and interactive information search behavior. Journal of the American Society for Information Science and Technology, 61(9), 1771–1789.

Li, Y., & Belkin, N.J. (2008). A faceted approach to conceptualizing tasks in information seeking. Information Processing & Management, 44(6), 1822–1837.

Li,Y. (2009). Exploring the relationships between work task and search task in information search. Journal of the American Society for Information Science and Technology, 60(2), 275–291.

Liu, J., Gwizdka, J., Liu, C., & Belkin, N. J. (2010). Predicting task difficulty for different task types. Proceedings of ASIST 2010, October 22–27, 2010, Pittsburgh, PA, USA.

Liu, J., Liu, C., Yuan, X., & Belkin, N. J. (2011). Understanding searchers’ perception of task difficulty:

relationships with task type. Proceedings of ASIST 2011, October 9–13, 2011, New Orleans, LA, USA.

Maynard, D. C., & Hakel, M. D. (1997). Effects of objective and subjective task complexity on performance. Human Performance, 10(4), 303-330.

Page 10: Interactive retrieval using simulated versus real work task situations: Differences in sub-facets of tasks and interaction performance

Vakkari, P. (2003). Task-based information searching. Annual Review of Information Science and Technology, 37, 413-464.

Xie, I. (2009). Dimensions of tasks: Influences on information-seeking and retrieving processes. Journal of Documentation, 65(3), 339–366.

Yuan, X., & Belkin, N. J. (2010a). Investigating information retrieval support techniques for different

information-seeking strategies. Journal of the American Society for Information Science and Technology, 61(8), 1543–1563.

Yuan, X., & Belkin, N. J. (2010b). Evaluating an integrated system supporting multiple information-seeking strategies. Journal of the American Society for Information Science and Technology, 61(10),1987–2010.