the role of automated categorization in e-government information retrieval tanja svarre &...

The Role of Automated Categorization in E-Government Information Retrieval

Tanja Svarre & Marianne Lykke, Aalborg University, DKISKO conference, 8th of July, 2013.

Agenda

• Background of the study• Theoretical framework• Research methods• Results• Summary and closing remarks

Background to the search test

• Initiated and partially cofinanced by the Danish National IT and Telecom Agency

• Purpose: To investigate how automatic assignment of metadata can contribute to the intention of increased efficiency and effectiveness in (Danish) e-government

Building on indexing/categorization:

• Early Cranfield tests

Categorization is helpful:• when the query is vague, broad, general, or

ambiguous • when result rakings are deficient

(Käki, 2005)• in supporting exploratory searches• in understanding large search sets

(Kules & Shneiderman, 2004; 2005)

Research methods

• Case study in the Danish Tax Authorities• Search test:• Controlled lab test• Comparison test• Professional users• Domain specific search tasks• Pre test questionnaire• Log data• Post search interview

Data: Search test

• System characteristics:B Prototype of the corporate intranetB www.skat.dk content and internal information

• 2 search systems:B Free text indexing (SYSTEM A)B Categorization (SYSTEM B)

• 32 test persons• 3 controlled and 1 natural search task per

session, 2 tasks per system

http://www.skat.dk/

Search test: General findings

Variables System ASessions N=64Queries N=229

System BSessions N=64Queries N=335

Number of terms in queries (averages)

2.25 2.43

Search filter ‘document type’ applied (percentages)

43.2 31.6

Number of sessions with reformulations (percentages)

65.6 82.8

Number of reformulations in sessions (averages)

2.58 4.23

Query success (percentages)

30.6 21.5

Session success (percentages)

89.1 84.4

Success at task level

Sim1 Sim2 Sim3 NWT Total

SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB

Session succeeded

15 (93.8)

16 (100.0)

15 (93.8)

9 (56.3)

16 (100.0

16 (100.0

11 (68.8)

13 (81.3)

57 (89.1)

54 (84.4)

Query succeeded

18 (58.1)

23 (33.3)

17 (30.4)

11 (9.7)

20 (27.8)

22 (25.6)

15 (21.4)

16 (23.9)

70 (30.6)

72 (21.5)

• At task level the success of the two systems differs

Task level results


System A

1.94 (n=16)

3.50 (n=16)

4.50 (n=16)

4.38 (n=16)

3.58 (n=64)

System B

4.31 (n=16)

7.06 (n=16)

5.38 (n=16)

4.19 (n=16)

5.23 (n=64)

Total 3.13 (n=32)

5.28 (n=32)

4.94 (n=32)

4.28 (n=32)

4.41 (n=128)


System A

2.32 (n=31)

2.39 (n=56)

2.42 (n=72)

1.94 (n=70)

2.25(N=229)

System B

2.54 (n=69)

2.88 (n=113)

1.79 (n=86)

2.39 (n=67)

2.43 (N=335)

Total2.47 (n=100)

2.72 (n=169)

2.08 (n=158)

2.16 (n=137)

2.36 (N=564)

Reformulations Total

SysA SysB

No reformulations 69 (30.1) 62 (18.5)

Category - 114 (34.0)

Query terms 97 (42.4) 47 (14.0)

Document type 28 (12.2) 8 (2.4)

Search operators 8 (3.5) 5 (1.5)

>1 types simultaneously

27 (11.8) 99 (29.6)

Total 229 (100) 335 (100)

System B (cat.) omissions

Number of sessions in system B

Number of successful sessions system B

System B 26 (40.6) 22 (40.7)

Combined system B sessions

38 (59.4) 32 (59.3)

Total 64 (100.0) 54 (100.0)

System B (cat.) omissions

• Highly relevant documents are discovered before a category has been selected

• Relevant documents are located while waiting for B (cat.) to categorize search results

• Categorization is not relevant when few documents are retrieved

Summary• Categorization is useful:• When employees do not posess extensive

knowledge about the task at hand• In offering new perspectives on the

composition of a qury• In understanding facets of queries• When task knowledge is present, categorization

is used to support the assumptions of a correct search

Summary

• Categorization is omitted when:• Search results are limited• When relevant documents are ranked at the

top of the results

National IT & Telecom Agency: Findings• The participants start out with free

text indexing and supplement with the other when necessary

• The indexing methods compared are complementary

• To meet the variety of information needs several indexing me-thods should be representedsimultaneously

Thank you for your attention!

?

the role of automated categorization in e-government information retrieval tanja svarre &...

Documents