page 1 of 2 - rutgers accounting web | rawaccounting.rutgers.edu/miklosvasarhelyi/resume...

30
AUTHOR QUERY FORM Journal: ACCINF Please e-mail or fax your responses and any corrections to: Mohammed, Thoufiq E-mail: [email protected] Fax: +1 619 699 6721 Article Number: 321 Dear Author, Please check your proof carefully and mark all corrections at the appropriate place in the proof (e.g., by using on-screen annotation in the PDF le) or compile them in a separate list. Note: if you opt to annotate the le with software other than Adobe Reader then please also highlight the appropriate place in the PDF le. To ensure fast publication of your paper please return your corrections within 48 hours. For correction or revision of any artwork, please consult http://www.elsevier.com/artworkinstructions. Any queries or remarks that have arisen during the processing of your manuscript are listed below and highlighted by ags in the proof. Click on the Qlink to go to the location in the proof. Location in article Query / Remark: click on the Q link to go Please insert your reply or correction at the corresponding line in the proof Q1 Please conrm that given names and surnames have been identied correctly. Q2 The country name United Stateshas been inserted for all authors' afliation. Please check, and correct if necessary. Q3 Citation Previts and Brown 1993has not been found in the reference list. Please supply full details for this reference. Q4 Citation Rodgers and Fogarty 1997has not been found in the reference list. Please supply full details for this reference. Q5 , Q10 The citation Nobata 1999has been changed to match the author name/date in the reference list. Please check here and in subsequent occurrences, and correct if necessary. Q6 , Q7 , Q8 Gangolly and Wu, 2000 was mentioned here but not in the reference list, however Gangolly and Wu was included in the list but was uncited in the text. No similar reference was given therefore it was presumed that Gangolly and Wu, 2000 should be linked to the reference found in the list. Please check if appropriate. Q9 The citation Nobata (1999)has been changed to match the author name/date in the reference list. Please check here and in subsequent occurrences, and correct if necessary. Q11 Citation Salton and Buckley, 1996has not been found in the reference list. Please supply full details for this reference. Q12 Please provide the complete details for footnote 8. Q13 The citation Tan et al., 2006has been changed to match the author name/date in the reference list. Please check here and in subsequent occurrences, and correct if necessary. Our reference: ACCINF 321 P-authorquery-v11 Page 1 of 2

Upload: others

Post on 30-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

AUTHOR QUERY FORM

Journal: ACCINF Please e-mail or fax your responses and any corrections to:Mohammed, ThoufiqE-mail: [email protected]: +1 619 699 6721

Article Number: 321

Dear Author,

Please check your proof carefully and mark all corrections at the appropriate place in the proof (e.g., by using on-screenannotation in the PDF file) or compile them in a separate list. Note: if you opt to annotate the file with software other than AdobeReader then please also highlight the appropriate place in the PDF file. To ensure fast publication of your paper please returnyour corrections within 48 hours.

For correction or revision of any artwork, please consult http://www.elsevier.com/artworkinstructions.

Any queries or remarks that have arisen during the processing of your manuscript are listed below and highlighted by flags inthe proof. Click on the ‘Q’ link to go to the location in the proof.

Location in article Query / Remark: click on the Q link to goPlease insert your reply or correction at the corresponding line in the proof

Q1 Please confirm that given names and surnames have been identified correctly.

Q2 The country name “United States” has been inserted for all authors' affiliation. Please check, andcorrect if necessary.

Q3 Citation “Previts and Brown 1993” has not been found in the reference list. Please supply fulldetails for this reference.

Q4 Citation “Rodgers and Fogarty 1997” has not been found in the reference list. Please supply fulldetails for this reference.

Q5, Q10 The citation “Nobata 1999” has been changed to match the author name/date in the reference list.Please check here and in subsequent occurrences, and correct if necessary.

Q6, Q7, Q8 Gangolly and Wu, 2000 was mentioned here but not in the reference list, however Gangolly andWu was included in the list but was uncited in the text. No similar reference was given therefore itwas presumed that Gangolly and Wu, 2000 should be linked to the reference found in the list.Please check if appropriate.

Q9 The citation “Nobata (1999)” has been changed to match the author name/date in the reference list.Please check here and in subsequent occurrences, and correct if necessary.

Q11 Citation “Salton and Buckley, 1996” has not been found in the reference list. Please supply fulldetails for this reference.

Q12 Please provide the complete details for footnote 8.

Q13 The citation “Tan et al., 2006” has been changed to match the author name/date in the reference list.Please check here and in subsequent occurrences, and correct if necessary.

Our reference: ACCINF 321 P-authorquery-v11

Page 1 of 2

Page 2: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

Q14 Please check placement of Table 9 (previously uncited).

Please check this box if you have nocorrections to make to the PDF file. □

Thank you for your assistance.

Our reference: ACCINF 321 P-authorquery-v11

Page 2 of 2

Page 3: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

UNCO

RRECTED P

RO

OF

1 Highlights

2 International Journal of Accounting Information Systems xxx (2014) xxx–xxx

4

5 Automatic classification of6 accounting literature7

8 Vasundhara Chakraborty a, Victoria Chiu b,⁎, Miklos Vasarhelyi c

910

a Ramapo College of New Jersey, United States11

b Rutgers Business School, United States12

c AIS, Rutgers Business School, United States

1314

15 • Semantic parsing, information retrieval and data mining techniques are applied.16 • Keywords and abstracts of research are used as classification criteria.17 • Decision trees and rule-based algorithms arrive at the most accurate results.18 • Abstracts are better measures than keywords in classifying accounting literature.19 • Expanding sample size of articles improves accuracy of automatic classification.

20

21

International Journal of Accounting Information Systems xxx (2014) xxx

ACCINF-00321; No of Pages 1

1467-0895/$ – see front matter © 2014 Published by Elsevier Inc.http://dx.doi.org/10.1016/j.accinf.2014.01.001

Contents lists available at ScienceDirect

International Journal of AccountingInformation Systems

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Page 4: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

OF

1

International Journal of Accounting Information Systems xxx (2014) xxx–xxx

ACCINF-00321; No of Pages 27

Contents lists available at ScienceDirect

International Journal of AccountingInformation Systems

Automatic classification of accounting literature

2Q1

345Q2

6

PRO

Vasundhara Chakraborty a, Victoria Chiu b,⁎, Miklos Vasarhelyi c

a Ramapo College of New Jersey, United Statesb Rutgers Business School, United Statesc AIS, Rutgers Business School, United States

89

a r t i c l e i n f o

ORR

⁎ Corresponding author.E-mail address: [email protected] (V. Chiu).

1467-0895/$ – see front matter © 2014 Published byhttp://dx.doi.org/10.1016/j.accinf.2014.01.001

Please cite this article as: Chakraborty VAccount Inf Syst (2014), http://dx.doi.org/

a b s t r a c t

101112131415

16

17

18

19

20

Article history:Received 9 November 2012Received in revised form 19 September 2013Accepted 4 January 2014Available online xxxx

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36373839404142

4344

ECTED

This paper explores the possibility of using semantic parsing, informationretrieval and datamining techniques to automatically classify accountingresearch. Literature taxonomization plays a critical role in understand-ing a discipline's knowledge attributes and structure. The traditionalresearch classification is a manual process which is considerably timeconsuming and may introduce inconsistent classifications by differentexperts. Aiming at aiding this classification issue, this study conductedthree studies to seek the most effective and accurate method to classifyaccounting publications' attributes. We found results in the third studymost rewarding inwhich the classification accuracy reached 87.27%withdecision trees and rule-basedalgorithmsapplied. Findings in thefirst andsecond studies also provided valuable implications on automatic litera-ture classifications, e.g. abstracts are better measures to use than key-words and balancing under-represented subclasses does not contributetomore accurate classifications. All three studies' results also suggest thatexpanding article sample size is a key to strengthen automatic classifi-cation accuracy. Overall, the potential path of this line of research seemsto be very promising and would have several collateral benefits andapplications.

© 2014 Published by Elsevier Inc.

Keywords:Accounting literatureAutomatic classificationTaxonomyAttributesSemantic parsingData mining

45

46

47

48

49

50

UNC

1. Introduction

The purpose of this study is to develop an automatic classificationmethod to identify the characteristics ofaccounting and accounting information system research by applying text analytic techniques. Literaturetaxonomization plays an important role in understanding the knowledge in a discipline; this classificationtechnique assists researchers in examining the development of research areas and disciplines by categorizing

Elsevier Inc.

, et al, Automatic classification of accounting literature, Int J10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"givenname"
Original text:
Inserted Text
"givenname"
Original text:
Inserted Text
"givenname"
Original text:
Inserted Text
"surname"
Original text:
Inserted Text
"surname"
Original text:
Inserted Text
"surname"
Original text:
Inserted Text
"a "
Original text:
Inserted Text
"and "
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"bas ed"
Page 5: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

51

52

53

54Q3

55

56

57

58

59

60

61

62

63Q4Q5Q6

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

2 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

UNCO

RRECTED P

RO

OF

documents in several dimensions. It has been used in prior literature to characterize research into specificresearch subfields in accounting (Birnberg and Shields, 1989; Meyer and Rigsby, 2001; Heck and Jensen,2007) and summarize the development of major accounting and accounting information system journalpublications (Brown et al., 1987; Brown et al., 1989; Vasarhelyi et al., 1988, Previts and Brown 1993, Badua,2005; Badua et al., 2011).

The traditional research classification method is a manual process where researchers and/or expertscomprehend the article content first and subsequently assign attributes of the taxonomy to the designatedmanuscript. Categorization of manuscripts is typically based upon a literature taxonomy developed byresearchers. The taxonomy enables scholars to arrive at specific characteristics of research in a given fieldby using it as an academic research index. While the extant research on accounting thought developmentand evolution applied the traditional manual classification method for long, concerns have been raised inthe literature that it is fairly time consuming, costly, and could introduce inconsistent classifications bydifferent researchers (Rodgers and Fogarty 1997, Nobata et al., 1999; Gangolly andWu, 2000; Krippendorff,2004; Fisher et al., 2010). Enabling classifications of journal articles automatically could ease these con-cerns and benefit academic researchers and graduate students in a number of ways. Searches for publi-cations on particular topical areas, research methods and other characteristics of interests could beconducted more efficiently. Summarization and implication of the literature development over time couldbe examined with enhanced productivity as well. Applying the technique of automatic research classi-fication to accounting and accounting information systems research would support academicians on theidentification of literature characteristics and the ability to locate research articles of interest in a moreefficient manner.

Acknowledging the criticality of literature taxonomization and the aforementioned research issue, thegoal of this study is to refine the process of literature classification and taxonomization to benefit researchersin the accounting information systems and accounting discipline in using classification results. This studywasconducted in three phases, to address three main research questions: 1) Using only keywords provided injournal articles, how can the classification process of accounting literature be automated?; 2) Using theabstracts of academic journal articles, how can we automate the process of classification of accountingliterature?; 3) Is the accuracy of classification of accounting literature impacted by the combination or choiceof elements used in the automation process?

Aiming at seeking the most effective and precise automatic method that classifies accounting researchpublished in multiple accounting and accounting information systems journals, semantic parsing and datamining tools are applied in this study to classify research articles by three taxonomic attribute categories1inthree studies. Summarizing results from the three phases, we found phase three to be the most promisingwith the highest degree of automatic classification accuracy of 87.27% by applying decision trees andrule-based algorithms. Findings from the first study suggests that article abstracts provide a bettermeasure ofautomatic classification compared to keywords. The second study results indicate that applying a balancingapproach for under-represented subclasses does not necessarily contribute to a more accurate classificationas expected. The third experiment's findings implied that a larger article sample size is very important toimprove the accuracy of automatic classification.

Research characterization serves as a valuable information for researchers aiming at revealing thedevelopment, trend and evolution of a certain research area or field of knowledge. Through the increasinglypopulated online and electronic databases, knowledge and data dissemination and communication hasbeen much more prevalent nowadays than decades ago. The benefits of technological advancement oninformation retrieval and usage, however, still requires more precise exploration and examination byresearchers. The contribution of this automation technique on research characteristics classification wouldextend the usefulness of information retrieval availability by enabling researchers, graduate students andreaders to arrive at the numerous characteristics of accounting and accounting information systemsresearch promptly. A widened scope of research and learning needs would be supported; furthermore,

1 This study compared the automatic classificationswith themanual classifications in Accounting Research Directory (ARD), specificallythree categories — accounting area, treatment, and mode of reasoning. The ARD (Brown et al., 1994) published accounting literatureclassifications of 11 accounting and accounting information system journals, see detailed description in methodology section.

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
" - "
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"""
Page 6: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

3V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

UNCO

RRECTED P

RO

OF

research in the accounting taxonomy and thought development area could specifically benefit from theautomatic classification technique by allowing an extensive set of research to be reviewed, categorized andexamined with much less constraint due to manual effort limitation.

This study builds upon the literature of both accounting and information science disciplines and shedslight on the approach of sharpening literature classification of accounting and accounting informationsystems to move towards a more automatic stage. The theoretical ground of this study could be compre-hended and linked to the information technology research framework proposed by March and Smith(1995). And to be more specific, this study relates to the methodology “build” and “evaluate” componentsof the framework as it provides insight on founding an automatic method that is value added to users ofacademia work. The future path of this line of research seems to be promising and comes with severalcollateral benefits and applications. The remainder of the study is structured as follows: The next sectionreviews relevant prior literature and addresses our main research questions. The third section introducesthe research methodology and the fourth reports findings from the three studies. Discussion on resultssummary, limitations and future research implications is provided in the last section.

2. Literature review and research questions

2.1. The traditional classification of accounting literature

Classification of research manuscript characteristics has historically been performed manually to identifyspecific research characteristics and reveal its overall development scope and trend. Themanual classificationprocess requires researchers to comprehend article content first and then assign its representing attributes. Itis often necessary for this line of research to involve a heavy process of manual classification on the content ofarticles by researchers and/or experts and it is fairly time consuming. Research attributes/characteristicsdevelopment and paradigms evolution have been examined by many accounting researchers throughanalyzing the article content published in journals in the past decades (Chatfield, 1975; Dyckman and Zeff,1984; Brown et al., 1987; Vasarhelyi et al., 1988; Brown et al., 1989; Heck and Jensen, 2007; Badua et al.,2011). This section surveys prior literature that applied the traditional classification approach in accountingand relevant business fields, with reviews on its main purpose and used approach as well.

A stream of literature has applied traditional classification on research articles with aims of revealingresearch attributes and knowledge evolution in publications of specific accounting journals. Chatfield(1975) examined the historical development of research published in the first fifty years of The AccountingReview (TAR) and identified four distinct stages of evolution by reviewing publications. Contributions ofresearch published in the Journal of Accounting Research (JAR) was analyzed by Dyckman and Zeff (1984);the study concluded that improvement was made mainly on empirical research in accounting, capitalmarkets and refined behavioral research areas in specific between 1963 and 1982. Fleming et al. (2000)examined the evolution of accounting publications by categorizing research methods, financial accountingsubtopics, citation analyses, length, and author background of The Accounting Review (TAR) journal from1966 to 1985. Heck and Jensen (2007) is another article that examined TAR's research contribution bytaxonomizing publications from 1926 to 2005 by authorship, research methods and accounting topics.Similar content analysis and taxonomization approaches were applied in Brown et al. (1987) to revealresearch characteristics and development trend of Accounting, Organizations and Society (AOS) from 1976 to1984. Just et al. (2012) recently examined publications of Accounting, Organizations and Society (AOS) from1990 to 2007 with a combination of analyses on article content classification and other bibliometrics toolssuch as citation and co-citation analysis.

Other than examining a specific journal, prior literature also provided a comprehensive review ofmanuscripts by manually classifying publications from multiple outlets. Vasarhelyi et al. (1988) classifiedaccounting literature of 2,136 articles published in six refereed accounting journals by its foundationdiscipline, school of thought, research methods, and mode of reasoning to systematically examine thehistorical evolution of research characteristics within 1963–1974. Brown et al. (1989) examined fourtaxonomies of over 1,100 studies of contemporary accounting literature published in AOS, TAR, JAE, and JARfrom 1976 to 1984. Literature has also applied the traditional classificationmethod to reveal characteristicsof a particular scope of research, school of thought ormethodology. Ijiri et al. (1968) surveyed and classifiedbudgeting literature by organization type, budgeting aspects, application areas, and merit evaluation; Felix

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"""
Original text:
Inserted Text
"""
Original text:
Inserted Text
"""
Original text:
Inserted Text
"""
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"–"
Original text:
Inserted Text
"–"
Page 7: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

150

151

152

153

154

155

156

157

158

159

160

161

162

163Q7

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187Q8

188

189

190

191

192

193

194

195

196

197

198

199

200

201

4 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

UNCO

RRECTED P

RO

OF

and Kinney (1982) reviewed auditing literature by a cross-classification approach; auditing literature arecategorized by a three-class dimension including state description, model/theory development and hy-pothesis tests as well as a four-class dimension of auditor's opinion formulation processes. In behavioralaccounting research area, Hofstedt (1975, 1976), Birnberg and Shields (1989), and Meyer and Rigsby(2001) classified and examined behavioral studies' research content, research methods and authorship.Besides accounting discipline, finance and information processing fields have applied classification toacademic research to help reveal knowledge evolution in sub-fields and over time as well (Ball, 1971;Hakansson, 1973; Gonedes and Dopuch, 1974; Libby and Lewis, 1977; Ashton, 1982; Libby and Lewis,1982). In summary, classification of literature over time has been a crucial and inevitable technique appliedin accounting and information systems research that help shed light on knowledge characteristics andevolution. While extant accounting thought development research has performed manual classification atlarge, concerns and drawbacks of this approach has been raised, e.g. manual content analysis couldintroduce researcher subjectivity; high costs of manual processing limits sample size and the power of testsand the generalizability of results (Gangolly and Wu, 2000; Krippendorff, 2004; Fisher et al., 2010).Developing an automatic method for article classification would be an appropriate remedy for these issues.The following section reviews relevant literature in both the information science field and accounting andinformation systems on the origin, development and application of the automatic classification techniqueand then the review leads to the three main research questions of this study.

2.2. The automatic classification of terms and documents

The automatic text classification technique is currently still a methodology that has been developing inthe literature. Thus far it has not been applied to classify characteristics of academic research in either theaccounting/accounting information systems or other disciplines very successfully or extensively.

The development of the automatic approach and its usage on classifying terms and useful thesaurus andgenerating groups of concepts has been explored in the information science literature decades before itsapplication in the accounting field (Crawford, 1979; Crouch, 1990; Crouch and Yang, 1992; Chen et al., 1995).For example, Crouch (1990) developed a method of automatic thesaurus construction based on the termdiscrimination value model. Automatic classification method has been proven to produce useful thesaurusclass that provides improvement towards information retrieval when it is used to supplement query terms(Crouch, 1990; Crouch and Yang, 1992). Terms and thesaurus have been automatically generated byalgorithmic approaches to support the generation and evaluation of systems and network (Chen and Lynch,1992; Chen et al., 1995). Similar techniques were developed for domain-specific thesaurus for Drosophiliainformation (Chen et al., 1994) and for computing knowledge base forWorm classification system (Chen andLynch, 1992). These studies in the information science field have all contributed to the building of theautomatic technique in different aspects.

Upon reviewing the literature of accounting and accounting information systems, automatic classificationtechniques have been found in the application of categorizing financial accounting concepts, conductingtextualmining, information retrieval, and readability offinancial statements andother accounting information(Gangolly and Wu, 2000; Garnsey, 2006; Garnsey and Fisher, 2008; Fisher et al., 2010). Gangolly and Wu(2000) examined the automatic classification of financial accounting concepts. It provided statistical analysison term frequencies of financial accounting standards, applied principal components analysis method todecrease dataset dimensionality and used agglomerative nesting algorithm to derive clusters of financialaccounting concepts. It is worthwhile to note that the study pointed out that the increasing size of textdatabases and high cost of domain expertise incurred to develop classifications have necessitated automaticindexing and conceptual classification methods to develop.

Extending the usefulness of automatic classification technique, information retrieval and text miningprocesses have been used in conjunction with automatic document processing method. Fisher et al. (2010)recently researched on the role of text analytics and information retrieval literature; the study reviewed bothmanual and computational content analysis of accounting narratives, readability and text-mining studies aswell as studies that consists of the extraction of both text elements and quantities imbedded in text fromaccounting documents. The study implied that computerized content analysis overcomes the limitationsintroduced by manual content analysis by using software to process electronic textual documents and countword frequency. The relevant literature in the accounting and information systems suggested and implied the

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"""
Page 8: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

5V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

RO

OF

need for computerized/automated classification tool for the usage of accounting information, concepts, anddocuments. However, as in the information science research, “academic literature” have not been used as themain classification object that utilizes automatic classification technique in prior accounting and accountinginformation systems literature either. And this observed gap in prior research works is where our studyprovides contribution to.

In line with prior literature that draws inferences from terms in documents, this study explores theautomatic classification of research attributes by using research articles' keywords and abstracts. A researcharticle incorporates many subsections; each represents different aspect of the manuscript's researchcontribution. While the title of any article can create a general impression about the topic of the paper, itdoes little to answer the content details such as, which foundation discipline the article belongs to or whattype of method it applied. Keywords are provided by the authors to enable readers to get an impression onthe content of an article, as to which research area the paper belongs to or what techniques were used toconduct the research. Keywords may serve as a dense summary for a document, lead to improved infor-mation retrieval, or be the entrance to a document collection (Barki et al., 1993; Hulth, 2003). We expectthat using keywords appropriately could facilitate the process of categorization to a great extent. Asystematic pattern in the usage of keywords could expedite the process of automatic classification; thisargument leads to our first research question:

219

220

PleaAcco

PRQ1: Using only keywords provided in journal articles, how can the classification process of accountingliterature be automated?

221

222

223

224Q9

225

226

227

228

229

230

231

232

233

234

235

RECTED

Academic research articles generally start with an “abstract” which summarizes the salient points of apaper. Prior literature has explored the usage of article abstracts in assisting automatic classification of biologyresearch. Nobata et al. (1999) explored the identification and classification of biology terms by applyinginformation extraction methods that could be generalized to 100 biological abstracts from MEDLINE. Thespecific techniques adopted to classify terms are statistical and decision tree methods; those used for termcandidate identification are shallow parsing, decision trees, and statistical identification methods. The studyconcluded that applying statistical and decision tree methods to automate biology terms generate the mostaccurate automatic classification results. They suggested that classification could vary based on the wordlistused as it introduces variance in class types and terms; future studies need to refine algorithms used forautomatic classification to reach higher classification accuracy.

Article abstracts can ideally provide crucial information of the research article such as its underlyingmotivation, research questions, research techniques, main findings as well as implications. Abstracts mayserve as convenient resources to generate useful wordlist to categorize article characteristics automatically.The second research question in this study examines this argument:

R

236

237

ORQ2: Using the abstracts of academic journal articles, how can we automate the process of classificationof accounting literature?

238

239

240Q10

241

242

NCLiterature has shown that the extent of accuracy will likely vary based on the different measures

applied in automatic classification process (e.g. Nobata et al., 1999; Hulth, 2003). In line with prior studies,we examine the potential automatic classification differences between keyword and abstract results bycomparing the overall classification accuracy; the third research question is as follows:

243

244

URQ3: Is the accuracy of classification of accounting literature impacted by the combination or choice ofelements used in the automation process?

245

246

247

248

249

250

251

252

Taken together, the automatic text classification method originated from the information scienceliterature and has been used to aid the process of traditional manual classifications to be more efficient andless inconsistent. While the automatic method has been widely used in many fields, only limited researchhas examined the automatic research characteristics' classification in accounting and accountinginformation systems domain. This study aims at contributing to this line of research and expects toimprove the classification approach long used in the accounting thought development research andsupporting the capacity of the articles that can be classified and examined promptly in research.

se cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int Junt Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"""
Original text:
Inserted Text
"""
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Page 9: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

Q11

6 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

CO

RRECTED P

RO

OF

3. Methodology

3.1. Sample collection and ARD taxonomy of attributes

Sample articles are hand collected from EbscoHost online academic database and selected based on twomain criteria. First, the accessibility of keywords and abstracts in articles from the database; keywords andabstracts of articles are used to generate wordlists to create training dataset in our studies. Second, theavailability of research attributes classified in the Accounting Research Directory (ARD). To measure howprecise automatic classification of research attributes is, this study uses a set of manual research attributesclassification extracted from the ARD (Brown et al., 1994)2 to benchmark against automatic classifications.Aiming at facilitating research efforts of accounting academics and practitioners, the ARD provides char-acteristics of research articles published in twelve leading accounting and accounting information systemjournals3 since 1963 by twelve research taxonomies (Appendix I). ARD has successfully supported priorliterature in understanding characteristics and research contribution of accounting and accounting in-formation systems publications over time (e.g. Brown et al., 1987; Vasarhelyi et al., 1988; Brown et al.,1989; Badua, 2005; Badua et al., 2011).

As a pilot study of automatic research attributes classification in accounting, selecting taxonomiccategories that have a fair number of articles represented (that falls under) in each sub taxonomic category isnecessary to generate representable results from classification studies. This study examines classificationsof three taxonomic categories, accounting area, treatment and mode of reasoning.4 A sample of 186 articles5

were used in the first study; the second study applies a balanced class approach encompassing 158 (inaccounting area), 453 (in treatment), and 217 (in mode of reasoning) articles. In the final study, a greatlyincreased sample sizewas used in an attempt to achieve higher classification accuracywith 763 articles in theaccounting area, 627 articles for treatment taxon, and 772 articles in mode of reasoning taxon. Our samplearticles are published between 1984 and 2008 and selected from ten accounting journals (Table 1).

3.2. Analyses: a two-phase study of automatic literature classification

Fig. 1 illustrates the twomain phases of the study: 1) phase I uses only keywords while and 2) II utilizesthe full abstract of journal articles. The first step in the automatic classification process was to develop aparsing tool to extract article keywords and full article abstracts. Parsing is a technique that has beendeveloped in the linguistic and computer science literature to analyze the given text by reasoning outthe grammatical structure applied in the text, also known as “syntactic analysis.” Semantic parsing allowsone to create customized language for specific objectives and is an essential step for Natural LanguageProcessing (NLP), information retrieval, and machine translation (e.g. Tucker, 1984; Trueswell et al., 1994;Salton et al., 1996, and Thompson et al., 1999).

Prior to performing the data mining process some data preprocessing was done.

1) To eliminate unwanted words, a combination of two stop words list was used. The function of a stopword list is to eliminate frequent occurring words that do not have any semantic bearing. The first stopword list applied in the study contains 571 words.6 The second stop word list is obtained from the Onix

UN2 The nature of the ARD's taxonomy and classification carries potential weaknesses that should be taken with caution Manual

classification of journal articles has been completed by multiple experts over a span of more than two decades; this may introduce avaried classification precision by experts. That is, the same article may have different designated attributes by different experts. Inaddition, attributes of research articles that fall under a relatively recent/emerging research area could be identified by ARDtaxonomy with challenge. A formalized and automated classification process could standardize the entire procedure.

3 Journal of Accounting Research, The Accounting Review, Accounting, Organizations and Society, Journal of Accounting Auditingand Finance, Journal of Accounting and Economics, Auditing: A Journal of Practice & Theory, Contemporary Accounting Research,Accounting Historians Journal, Journal of Information Systems, Journal of Accounting and Public Policy, Research in AccountingRegulation, and Journal of Emerging Technologies in Accounting.

4 Appendix III provides descriptions of three taxonomy categories: accounting area, treatment and mode of reasoning taxons(Brown et al., 1994 and Badua, 2005).

5 In analysis I, out of the 282 preliminary sample articles selected (based on ARD classified article list), 96 of them were excludedfor not providing either keywords or abstracts in the article.

6 SMART (Salton and Buckley, 1996).

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
" "
Original text:
Inserted Text
" "
Original text:
Inserted Text
" "
Original text:
Inserted Text
"’"
Original text:
Inserted Text
" "
Page 10: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

OO

F

289

290

291

292

293

Table 1t8:1

t8:2 Selected accounting journals for articles collection.

t8:3 Journals

t8:4 AOS Accounting, Organizations and Societyt8:5 AUD Auditing: A Journal of Practice and Theoryt8:6 CAR Contemporary Accounting Researcht8:7 JAAF Journal of Accounting, Auditing and Financet8:8 JAE Journal of Accounting and Economicst8:9 JAPP Journal of Accounting and Public Policyt8:10 JAR Journal of Accounting Researcht8:11 JETA Journal of Emerging Technologies in Accountingt8:12 JIS Journal of Information Systemst8:13 TAR The Accounting Review

7V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

PRText Retrieval Toolkit. Table 2 provides examples of extracted keywords and words from abstracts that

are included in the study to classify treatment taxonomic category.2) After eliminating the stopwords, a word count of the remainingwordswas performed. This step provides

the information on howmany times a particularword or phrase occurs in an article. If the count for awordis high, it suggests that it is a frequently occurring word. Term frequency was calculated as the next step

UNCO

RRECTED

Automatic Classification of Accounting Literature

Parse out keywords

Parse out full abstract

Database Word count

Attribute Selection

Create document –term matrix

Applyclassificationalgorithms

Word countAttribute Selection

Create document –term matrix

Applyclassificationalgorithms

Phase I: Using Keywords

Phase II: Using Full Abstracts

Termfrequency

Create n-grams

Termfrequency

Journal articles

Journal articles

Database

Fig. 1. Two-phase experiment process diagram. The first phase uses keywords and the secsecond phase uses the full abstracts of allcollected articles.

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Page 11: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

OF

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

Table 2t9:1

t9:2 Examples of keywords used for treatment taxon classification.

t9:3 Treatment classes Examples of words from full abstracts

t9:4 Auditing Client acceptance, audit partners, hypothesis generation, audit-scope.t9:5 Managerial Decision making, measure, compare, efficiency, evaluation.t9:6 Financial Market reactions, news, speculation, accruals.t9:7t9:8 Treatment classes Examples of keywords

t9:9 Auditing Auditor independence, risk evalution, risk adaptation, cost of continuous audits, materiality.t9:10 Managerial Decision performance, incentives compensation, mental models, performance measusures.t9:11 Financial Residual income valuation model, economic rents, earnings expectations.

8 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

UNCO

RRECTED P

ROwhich measures the number of certain word or phrase's occurrences out of the full list of journal articles.

This is achieved by adding up the word count from each of the articles for a particular term.7

3) Finally, a document-termmatrix was created based on term frequency. It is a mathematical matrix whichrows correspond to documents and columns correspond to terms; it aims at illustrating the frequency ofterms appearing in a collection of documents. An example of a document-termmatrix is given in Table 3.As the real document-termmatrix created in this study is very large in size, Table 3 provides an example ofterms and numbers rather than using the actual data. The first column “File” represents a particularjournal article. The column headings from the second column onward show the list of words that occur invarious journal articles, e.g. “Residual Income,” “Audit-scope,” “Materiality” and “Performance measure”.Each number in the matrix cell represents howmany times a certain word/phrase appears in a particulararticle, e.g. the word “materiality” occurs two times in the 31.txt document.

In the final step, datamining algorithmswere applied to the document-termmatrix. Four broad categoriesof data mining algorithms were used including supervised learning, rule based classifiers, decision trees andother miscellaneous algorithms. A detailed list of the applied algorithms and their corresponding results areprovided in Appendix III.

As shown in Fig. 1, the basic steps for Phase I and Phase II are identical. The main difference is that the fullabstract is extracted for Phase II whereas only keywords are extracted for Phase I. In case of Phase I there isone additional step after extraction of words from full abstract is the creation of word bi-grams and tri-gramswith the objective of finding meaningful phrases that have been used in the articles. The remaining steps inPhase II are essentially the same as in Phase I.

3.3. Validation of automatic classifications

In line with prior literature (e.g. Liu et al., 1998; Nigam et al., 2000; Hall et al., 2009), the validationprocess of the study was carried out by 1) five-fold cross validation and 2) 66% keywords split for trainingdataset (with the remaining 37% used for test dataset). The five-fold cross validation first divides thekeywords and abstracts into five subsets; the first subset is then used as a test set while the others serve astraining set altogether. This process continues by using the second subset as test set and the remainingones as training set; the iterative process continuous for five times. The 66% split method splits the datasetinto two parts — the first part that consists 66% of the data is used for training while the remaining 37% isused for testing (Hall et al., 2009). The next section presents the automatic classifications findings drawnfrom three studies.

4. Results and analysis

Within the three studies, results obtained from the third study using full abstracts of 282 articles forclassification shows the most accurate results. The approach and findings of the three studys follow.

7 A measure of how often a term is found in a collection of documents. TF is combined with inverse document frequency (IDF) as ameans of determining which documents are most relevant to a query. TF is sometimes also used to measure how often a wordappears in a specific document.

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"’"
Original text:
Inserted Text
"”"
Original text:
Inserted Text
"- "
Page 12: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

OF

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351Q13

352

353

354

355

356

357

358

359

360

361

Q12

Table 3t10:1

t10:2 An Example of Document-term Matrix.

t10:3 File Residual income Audit-scope Materiality Performance measure

t10:4 31.txt 6 2 2 1t10:5 1400.txt 3 0 2 0t10:6 6732.txt 4 0 2 0t10:7 8902.txt 0 3 2 1t10:8 4569.txt 3 0 0 2t10:9 8726.txt 0 4 0 1t10:10 7239.txt 7 1 0 0t10:11 543.txt 4 1 2 3

9V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

UNCO

RRECTED P

RO4.1. Analysis I: Keywords and abstracts approach

The first study sampled 282 articles with keywords applied in phase I and abstracts applied in phase IIof the study (Fig. 1). The algorithms that generated the highest classification accuracy of each taxon areshown in Table 4. The highest classification accuracy was found in Accounting Area8 (85.31%) performed byalgorithms that classified the results best was obtained by employing END algorithm and ComplementNaïve Bayes of Bayes classifier, both at 81.67%. Mode of Reasoning using the OneR algorithm of rule-basedclassifier group correctly classified the most articles by 56.62% in analysis I of this taxonomic category.9

The findings in study I indicate that using abstracts (phase II) to automatically classify articles' attributesresults in higher classification accuracy than that of using keywords (phase I). However, an exceptionoccurred inMode of Reasoning; the classification accuracy deteriorated in Phase II in comparison with Phase Iwhich is the opposite of the findings in Accounting Area and Treatment taxons where full abstract (phase II)improves the general results. The least performance in Mode of Reasoning taxon may be explained, to acertain extent, by its many subclasses (11) compared to Treatment (3) and Accounting area (6). These elevensubclasses have a non-uniform data representation. For example, within the 282 articles, only 7 articles usedFactor analysis/Probit/Discriminant reasoning method, whereas 158 articles applied Regression analysismethod. This non uniform dataset may have prohibited the training and testing datasets in this study togenerate resultswith higher accuracy. To improve the results fromanalysis I, a probable approach is to expandthe study dataset and collect articles with a fairly uniform representation of articles in subclasses of the testedtaxons. This may help generate better results, as the training dataset will be more comprehensive aftersubclasses are populatedwithmore sampled articles andwith amore unified representation. The next sectionelaborates on the second study that applies an expanded dataset with a manipulated balance of classes.

4.2. Analysis II: keywords, abstracts and balanced class manipulation approach

The second study was conducted to improve classification accuracy. The sample size was increased and abalanced class manipulation approach was adopted to tackle the potential negative effect caused by theimbalanced class problem (Tan et al., 2005). The attributes of the sampled accounting literature appears todominate in a few subclasses of the taxons rather than be evenly distributed among different subclasses. Forexample, themajority of the sampled articleswere classified under “Financial Accounting” in Accounting Area.Under Mode of Reasoning, the dominant methods that were applied in the sampled accounting studies arequantitative methods with a high focus on using “Quantitative: Regression.” The results found in AccountingArea, Treatment, and Mode of Reasoning are presented next:

4.2.1. Accounting areaThe manipulation affects the articles used in the study involved balancing it across classes to a certain

threshold. The ratio of the number of articles between each class to the least popular class in every taxonwas calculated. Applying a threshold is necessary prior to the study and therefore this study arbitrarilychose 2.5 as a cutoff point. Table 5a displays the number of articles each class used in the study.

8 .9 The complete experiment results of the two-phase analysis in Appendix III.

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"’"
Page 13: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

OF

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

t12:1

t12:2

t12:3

t12:4

t12:5

t12:6

t12:7

t12:8

t12:9

t12:10

t12:11

Table 4t11:1

t11:2 Analysis I — summary of classification results.

t11:3 Accounting area Treatment Mode of reasoning

t11:4 Phase II: Abstracts Phase II: Abstracts Phase I:Keywords

t11:5 Algorithms Classificationaccuracy

Algorithms Classificationaccuracy

Algorithms Classificationaccuracy

t11:6 Bayes Classifier-Complement Naive Bayes

85.31% Bayes Classifier-Complement Naïve BayesMiscellaneous-END

81.67% Rule-BasedClassifier-OneR

56.62%

10 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

CO

RRECTED P

ROResults shown in Table 5b suggest that the keywords phase provided better results than abstracts with

Naïve Bayes of Bayes classifier providing the highest accuracy at 67.21% under the five-fold cross validationclassification method; the second highest results is achieved at 66.67% accuracy by J48 and J48graftalgorithms in decision trees classifier group and PART algorithm of rule-based classifier group under the66% split classification method. The overall findings, however, showed a deterioration of accuracy incomparison to results in analysis (which highest result is at 85.31%).

4.2.2. TreatmentThe class distribution of the most dominant to the least popular class in treatment shows a 1.95 ratio

which is below the threshold of 2.5, therefore, the original collected dataset (Table 6a) is fully used in thestudy. Considering that treatment is a category that breaks down certain research area to sub-categories, wemerged the categories that belong to the same main accounting area into one general class. Specifically,subclasses under treatment taxon numbered #100–#171 were combined to a general financial accountinggroup, classes numbered #200–#219 altogether represents the general auditing group, and the #300–312subcategories consist of the general managerial group.

Comparing these results with that of Analysis I, we find substantial reduction in the accuracy of resultswith the highest level of accuracy being 42%, obtained using random tree and simple cart algorithms of thedecision trees family through the use of 66% split classification in keywords phase. The detailed findingsare shown in Table 6b.

4.2.3. Mode of reasoningAftermanipulation of balancing classes, articles applied inmode of reasoning section totals 218 (Table 7a).

The accuracy of results greatly declined from the previous.,the highest accuracy being only 39%, performed byj48 and j48graft algorithms of decision trees classifier group (Table 7b).

4.2.4. Conclusion from studies I & IIComparing results from the two studies, this study arrived at more favorable results from study I with

keywords and abstracts applied to the 282 sampled articles. The highest accuracy level is obtained fromthe accounting area taxon at 85.31% which is performed by complement naïve Bayes algorithm from Bayesclassifier group. While results in mode of reasoning taxon seem less rewarding, findings in treatment

UNTable 5a

Balancing classes — articles used in each subclass in accounting area taxon.

Accounting area

Original collected dataset Manipulated balanced class dataset used

Tax 32 Tax 32Financial 288 Financial 38Managerial 102 Managerial 35Audit 215 Audit 38Information systems 15 Information systems 15Total articles 652 Total articles 158max/min 19.2 max/min 2.5

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"-"
Original text:
Inserted Text
"-"
Original text:
Inserted Text
"-"
Original text:
Inserted Text
"study "
Page 14: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

OO

F

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

Table 5bt13:1

t13:2 Analysis II — summary results of accounting area.

t13:3 Accounting area

t13:4 Algo-rithms/Results

Five-fold cross validation 66% percentage split

t13:5 Algorithms Classificationaccuracy

Algorithms Classificationaccuracy

t13:6 Keywords Bayes Classifier-Naïv999e Bayes 67.21% Decision Trees Classifiers-J48 J48graftRule-Based Classifier—PART

66.67%t13:7 Decision Trees Classifier —

Random Forest60.65%

t13:8 Rule-Based Classifier—PART 60.65% Bayes Classifier—Naïve Bayes 42.85%t13:9 Abstracts Bayes classifier—Bayes Net 32.60% Bayes classifier—Complement Naive Bayes 38.70%t13:10 Decision trees classifier—J48 J48graft 29.34% Decision trees classifier—Random Forest 29.03%t13:11 Rule-Based Classifier—PART 28.26% Rule-Based Classifier—PART 22.58%

t14:1

t14:2

t14:3

t14:4

t14:5

t14:6

t14:7

t14:8

t14:9

11V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

NCO

RRECTED P

R

taxon provide favorable results with Complement Naïve Bayes of Bayes classifier group and END algorithmachieving accuracy at 81.67%.

The second studywas conducted for the purpose of improving results from study I, andmainly in themodeof reasoning taxon by balancing the classes in each taxon. However, results have surprisingly deterioratedfrom study I. One possible reason for this deterioration could be that the limited data availability for study IIafter applying class-balancing approach. The highest accuracy occurred at the accounting area taxon but onlyat 67.71%, obtained by the Naïve Bayes algorithm of Bayes classifier family under the five-fold classificationmethod. The less favorable results could possibly be explained by the trade-off between balancing thedistribution of classes and the number of articles eventually used in the study. In other words, though thearticles dataset has been expanded, the number of articles used in study II post class balance manipulationturns out to be less than that applied in study I.

Hence, it could be said that the assumption that inaccuracy of results was due to the negative effect ofimbalanced class issue on classification accuracy seemed less significant than our earlier prediction. Thefinal study aimed at improving results obtained by increasing the sample size used. This increased samplesize approach in the last study yielded by far the best results in terms of classification accuracy.

4.3. Final study — increased sample with abstracts applied

The final study was built upon the first study, by using a sample size of articles three times more thanstudy I. Specifically, 763 articles were used for accounting area taxon, 627 articles for treatment taxon, and772 for mode of reasoning. Since studies showed that phase II (using abstracts) provided better resultsthan phase I (using keywords), this study was conducted solely by using abstracts retrieved from thecollected articles to provide automatic classification results.

4.3.1. Accounting areaThe findings slightly improved from the results obtained in analysis I. Complement Naïve Bayes

Algorithm of Bayes classifier once again provides the best result with accuracy of 85.78% under the 66% splitmethod. The second and third highest results of accuracy are performed by the J48graft of decision tree

UTable 6aBalancing classes — articles used in each subclass in treatment taxon.

Treatment

Original collected dataset Manipulated balanced class dataset used

#100–#171 General Financial Accounting 191 #100–#171 General Financial Accounting 191#200–#219 General Auditing 164 #200–#219 General Auditing 164#300–#312 General Managerial 98 #300–#312 General Managerial 98Total 453 Total 453max/min ratio 1.95 max/min ratio 1.95

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"- "
Page 15: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

RO

OF

414

415

Table 6bt15:1

t15:2 Analysis II — summary results of treatment.

t15:3 Treatment

t15:4 Algo-rithms/Results

Five-fold cross validation 66% percentage split

t15:5 Algorithms Classificationaccuracy (%)

Algorithms Classificationaccuracy (%)

t15:6 Keywords Decision Trees Classifier—Random Forest

41.68 Decision Trees Classifier—Random TreeSimple Cart

42.00

t15:7 Bayes Classifier—Naïve BayesNaïveBayes Updateable

38.37 Bayes Classifier—Naïve Bayes Naïve BayesMultinomial Naïve Bayes MultinomialUpdateable Naïve Bayes UpdateableRule-Based Classifier—JRip

40.90

t15:8 Rule-Based Classifier—JRip 37.20

t15:9 Abstracts Bayes Classifier—Complement NaiveBayes Naïve Bayes Multinomial

30.63 Bayes Classifier—Complement Naive BayesRule-Based Classifier—OneR

26.11

t15:10 Decision Trees Classifier—Random Forest

23.03

t15:11 Rule-Based Classifier—JRip 29.62 Decision Trees Classifier—Random Forest 20.89

t16:1

t16:2

t16:3

t16:4

t16:5

t16:6

t16:7

t16:8

t16:9

t16:10

t16:11

t16:12

t16:13

t16:14

12 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

Pclassifier group at 81.42% and followed by Bayes Net algorithm of Bayes classifier at 80/96, both usedfive-fold cross validation method of classification (Table 8).

D

416

417

418

419

420

421

422

423Q14

CTE4.3.2. Treatment

Under treatment taxon, the most accurate classification result has been achieved in the final study aswell. Both J48graft algorithm of decision trees classifier and Ridor algorithms of rule-based classifieraccurately classified 87.27% of articles under the five-fold cross validation method.

The second best result was reached by using the DMNB Text algorithm of Bayes classifier group with86.67% of articles correctly classified. J48graft algorithm of decision tree family is the third best algorithmused which also ranks the highest among the 66% split section by accurately classifying 85.71% of articles.J48graft is followed by DMNB Text and Ridor algorithm which provide accurate results at 83.93% (Table 9).

E

424

425

426

427

428

429

ORR4.3.3. Mode of reasoning

The level of accuracy improved most significantly under mode of reasoning taxon in the final study incomparison with the prior two studies (Table 10). The highest accuracy level at 70.18% is obtained fromboth the Naïve Bayes Multinomial and Naïve Bayes Multinomial Updateable of Bayes classifier groupunder the five-fold cross validation classification method. This finding has improved nearly 14% from theresult shown in the first study (56.62%).

UNCTable 7a

Balancing classes — articles used in each subclass in mode of reasoning taxon.

Mode of reasoning

Original collected dataset Manipulated balanced class dataset used

Quantitative: DS 51 Quantitative: DS 33Quantitative: Regression 270 Quantitative: Regression 35Quantitative: Anova 73 Quantitative: Anova 33Quantitative:Factor Analysis/MDS/Probit/Discriminant

19 Quantitative:Factor Analysis/MDS/Probit/Discriminant

19

Quantitative: Non-Parametric 16 Quantitative: Non-Parametric 16Quantitative: Correlation 14 Quantitative: Correlation 14Quantitative: Analytical 107 Quantitative: Analytical 35Qualitative 49 Qualitative 33Total 599 Total 218max/min ratio 19.28 max/min ratio 2.5

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Page 16: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

RO

OF

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

Table 7bt17:1

t17:2 Analysis II — summary results of mode of reasoning.

t17:3 Mode of reasoning

t17:4 Algorithms/results

Five-fold cross validation 66% percentage split

t17:5 Algorithms Classificationaccuracy (%)

Algorithms Classificationaccuracy (%)

t17:6 Keyword Decision Trees Classifier—J48 J48graft 39.00 Bayes Classifier—Naïve Bayes 35.29t17:7 Bayes Classifier—Bayes Net 38.89 Decision Trees Classifier—BF Tree

Rule-Based Classifier—DecisionTable-Ridor

29.41t17:8 Rule-Based Classifier-PART 36.00

t17:9 Abstract Decision Trees Classifier—J48 J48graft 28.24 Bayes Classifier—DMNB Text 31.11t17:10 Bayes Classifier—Naïve Bayes Naïve Bayes

Multinomial Naïve Bayes MultinomialUpdateable Naïve Bayes Updateable

27.48 Rule-Based Classifier—PART 22.22

t17:11 Rule-Based Classifier—PART 25.95 Decision Trees Classifier—Random Forest 20.00

t18:1

t18:2

t18:3

t18:4

t18:5

t18:6

t18:7

t18:8

13V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

NCO

RRECTED P

The second highest result is obtained by applying the 66% split method, a 65.45% of accuracy wasperformed by the DMNB algorithm of Bayes classifier group which is followed by the random forestalgorithm (decision tree group) that performed the third best at 64.22%.

5. Conclusion and implications

This article develops a literature classification technique to identify and categorize characteristics ofaccounting and accounting information systems research automatically. It uses semantic parsing and datamining techniques to explore the possibilities of developing amethodology to automatically classify academicarticles in accounting, on the basis of various criteria and taxons. Three studies were conducted to examineand refine the classification process. Keywords and abstracts of research articles were both applied to the firsttwo studies while in the final study only abstracts were used.

A summary of findings in the three studies are illustrated in Table 11. The final study provides mostaccurate results with 87.27% of sampled articles correctly classified by J48graft (decision trees) and Ridor(rule-based) algorithms in the treatment taxon under five-fold cross validation. Complement Naïve Bayes ofBayes classifier performed very well in accounting area classification by reaching accuracy of 75.78% underthe 66% split approach. In mode of reasoning taxon, Bayes algorithms have provided results that improvedgreatly from remaining studies, which achieved a correct classification level of 70.18%. The overall resultsobtained in the final study outperformed the first two significantly, suggesting that using abstracts is moreeffective than keywords in generating the training set to automate the classification process. In addition,results suggest that expanding sample size is a critical issue that contributes to a better automatic classifi-cation performance.

Taxonomization of literature has been critical in many disciplines. However, prior literature classifiedresearch works manually, which is a process that is fairly time consuming and could lead to classificationinconsistencies. The findings in this study seempromising and indicate that the aforementioned limitationscan be improved by automatically classifying the literature. This study is a contribution to the generalaccounting literature and accounting information systems literature in particular. The analyses conducted

UTable 8Final study—summary results of accounting area.

Accounting area

Five-fold cross validation 66% percentage split

Algorithms Classificationaccuracy (%)

Algorithms Classificationaccuracy (%)

Decision-Trees Classifier—J48graft 81.42 Bayes-Classifier—ComplementNaive Bayes 85.78Bayes Classifier—Bayes Net 80.96 Decision-Trees Classifier—J48graft 79.05Rule-Based Classifier—Ridor 77.98 Rule Based Classifier—Ridor 72.97

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Page 17: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

OF

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

Table 9t19:1

t19:2 Final study — summary results of treatment.

t19:3 Treatment

t19:4 Five-fold cross validation 66% percentage split

t19:5 Algorithms Classificationaccuracy (%)

Algorithms Classificationaccuracy (%)

t19:6 Decision-Trees Classifier—J48graftRule-Based Classifer—Ridor

87.27 Decision-Trees Classifier—J48graft 85.71t19:7 Bayes Classifier—DMNB Text

Rule Based Classifier—Ridor83.93

t19:8 Bayes Classifier-DMNB Text 86.67

t20:1

t20:2

t20:3

t20:4

t20:5

t20:6

t20:7

t20:8

14 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

CO

RRECTED P

ROuse semantic parsing and classification methodologies, and the results carry impact on the broader

accounting literature by facilitating and strengthening the process of research classification. A broaderrange of learning and research needs could be satisfied through the application of this automatic literatureclassification technique; especially for research conducted in the accounting taxonomy and thoughtdevelopment topical areas, for example, it allows researchers to examine and reviewmuch greater volumeof literature with minimum manual effort involved. Analysis would be supported by the many availableonline electronic databases in this age. Furthermore, findings not only reveal the usefulness of semanticparsing and data mining tools on refining accounting literature classification, which enable researchers,graduate students and readers to promptly arrive at the numerous characteristics of accounting andaccounting information systems research, but also could lead to implications on potential emergingtaxonomic classes suggesting future research development directions.

Future research can continue to build on this study by exploring the automation of classification withother criteria or taxons of accounting literature applied. Additionally, developing techniques with higherprecision, benefiting other disciplines by applying automatic taxonomization of publications, sharpeningthe tools for analyzing research evolution etc., are other directions in which further research can makecontribution on. An integral part of developing similar classification processes for different taxons, withseveral subclasses, would be building a training dataset that has a uniform representation of the differentclasses. It would also be interesting to explore whether any new areas of research are developing, meaningthat whether new classes/taxons need to be added to the taxonomy. Developing an automated method toexplore this type of research would be an interesting extension of this study as well.

One of the limitations of this study is the insufficient number of articles sampled in developing theclassification methodology. As demonstrated by the results from studies, classification accuracy improvessignificantlywith the increase of the number of articles in the data corpus. Thus, downloading and samplingan even larger set of articles would likely to increase classification accuracy more.

Phase II with the adoption of article abstracts leads to better results in the end compared to usingkeywords. However, abstracts generally have restrictions on the number of words and as a result may notcontain sufficient information to accurately classify articles' attributes. With the inclusion of other sectionsof an article such as the conclusion, results, methodology or even the full article for classification maylikely have a positive impact on the results. Future research could consider these factors in the study.

Results from Phase I suggest that the use of keywords is far less effective in correctly classifying journalarticles as compared to article abstracts. While keywords have been taken as measures of research articles'

UN

Table 10Final study — summary results of mode of reasoning.

Mode of reasoning

Five-fold cross validation 66% percentage split

Algorithms Classificationaccuracy

Algorithms Classificationaccuracy

Bayes Classifier—Naïve Bayes MultinomialNaïve Bayes Multinomial Updateable

70.18% Bayes Classifier—DMNB Text 65.45%

Decision Trees Classifier—Random Forest 64.22% Decision Trees Classifier—Random ForestRule-Based Classifier—JRip

62.83%Rule-Based Classifier—ZeroR Conjunctive Rule 63.99%

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Page 18: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

PRO

OF

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

Table 11t21:1

t21:2 Studies results comparison.

t21:3 Resultscomparison

Accounting area Treatment Mode of reasoning

t21:4 Algorithms Classificationaccuracy

Algorithms Classificationaccuracy

Algorithms Classificationaccuracy

t21:5 Final study(abstracts)

66% percentage split Five-fold cross validation Five-fold cross validationt21:6 Bayes Classifier—

Complement NaiveBayes

85.78% Decision TreesClassifier—J48graftRule-Based Classifer—Ridor

87.27% Bayes Classifier—NaïveBayes Multinomial NaïveBayes MultinomialUpdateable

70.18%

t21:7 Analysis I Phase II: Abstracts/66% Split Phase II: Abstracts/66% Split Phase I:Keywords/66%Splitt21:8 Bayes Classifier—

Complement NaiveBayes

85.31% Bayes Classifier—Complement Naïve BayesMiscellaneous-END

81.67% Rule-Based Classifier—OneR

56.62%

t21:9 Analysis II Phase I: Keywords/66% Split Phase I: Keywords/66% Split Phase I: Keywords/Five-foldt21:10 Decision Trees

Classifiers—J48J48graft Rule-BasedClassifier—PART

66.67% Decision TreesClassifier—RandomTree Simple Cart

42.00% Decision TreesClassifier—J48 J48graft

39.00%

15V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

UNCO

RRECTEDcontent in the literature (Hulth, 2003), the finding in this study is a pointer towards the inadequacy of

keywords being used to describe articles. Future research needs to further assess how and what kind ofkeywords should be used to correctly represent article content. Creating a set of keywords and surveyingauthors to find out their opinion about using them in their articles or suggesting the list to be used byacademic databases are some of the measures that could be incorporated in future research.

Finally, there may be some judgmental differences between different experts that have coded the articlesused in this study over a prolonged period of time. As a result, different experts could have categorized similararticles under different taxons and this could have had a bearing on thefinal results in this study. The accuracyof automatic classification was validated against the manual codes provided by the experts and if thesemanual codes have judgmental differences then it would in turn affect the results of automatic classification.To fine tune and arrive at a robust standardized method for automatic article classification, future researchshould consider usingmultiple domain experts to reclassify all the journal articles used in the training datasetto assure that judgmental errors do not bias the results.

Appendix I

I.A. Taxonomy Classes

A. RESEARCH METHOD

1. Analytical Internal Logic2. Analytical Simulation3. Archival — Primary4. Archival Secondary5. Empirical Case6. Empirical Field7. Empirical Lab8. Opinion Survey9. Mixed

B. Inference Style

1. Inductive2. Deductive3. Both

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
" - "
Page 19: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

16 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

UNCO

RRECTED P

RO

OF

C. Mode of Reasoning1. Quantitative: Descriptive Statistics2. Quantitative: Regression3. Quantitative: Anova4. Quantitative Factor .Analysis, MDS, Probit, Discriminant5. Quantitative: Markov6. Quantitative: Non-Parametric7. Quantitative: Correlation8. Quantitative: Analytical

10. Mixed90. Qualitative

D. Mode of Analysis1. Normative2. Descriptive3. Mixed

E. School of Thought1. Behavioral — Hips2. Behavioral — Other3. Statistical Modeling — EMH4. Statistical Modeling Time Series5. Statistical Modeling Information Economics6. Statistical Modeling Mathematical Programming7. Statistical Modeling Other8. Accounting Theory9. Accounting History

10. Institutional11. Other;12. Agency Theory13. Expert Systems

F. Information100. Financial Statements101. Net Income or EPS102. Income Statement103. Balance Sheet104. Cash Flows, Etc.105. Other Fin. Statement106. Financial Ratios107. Combinations 1–2108. Quarterly Reports109. Foreign Currency110. Pension112. Debt Covenants200. Internal Information201. Performance Measure5202. Personality Measures203. Auditor Behavior204. Manager Behavior205. Decision Making206. Internal Controls207. Costs

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"'"
Original text:
Inserted Text
"'"
Original text:
Inserted Text
"'"
Original text:
Inserted Text
"'"
Original text:
Inserted Text
"'"
Original text:
Inserted Text
" - "
Original text:
Inserted Text
" - "
Original text:
Inserted Text
" - "
Original text:
Inserted Text
"'"
Page 20: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

17V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

UNCO

RRECTED P

RO

OF

208. Budgets209 Group Behavior210. Pricing211. Compensation300. External Information301. Footnotes302. Sec Info, (10 K)303. Forecasts304. Audit Opinion305. Bond Rating309. Other400. Market Based Info401. Risk402. Security Prices or Return403. Security Trading404. Options405. All Of The Above-Market500. Mixed

G. Treatment100. Financial Accounting Methods101. Cash102. Inventory103. Other Current Assets104. Property Plant & Equip/Depr105. Other Non-Current Assets106. Leases107. Lon6 Tern Debt108. Taxes109. Other Liabilities121. Valuation (Inflation)122. Special Items131. Revenue Recognition132. Accounting Changes133. Business Combinations134. Interim Reporting135. Amortization/Depletion136. Segment Reports137. Foreign Currency141. Dividends-Cash143. Pension (Funds)150. Other — Financial Accounting160. Financial Statement Timin6g170. R & D171. Oil & Gas200. Auditing201. Opinion202. Sampling203. Liability204. Risk (????)205. Independence206. Analytical Review207. Internal Control208. Timing

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"6"
Original text:
Inserted Text
"6"
Original text:
Inserted Text
"6"
Original text:
Inserted Text
" -"
Original text:
Inserted Text
"'"
Page 21: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667668

669

670

671

672

673

674

675676

677

18 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

UNCO

RRECTED P

RO

OF

209. Materiality210. EDP Auditing211. Organization212. Internal Audit213. Errors214. Trail215. Jud6ement216. Planning217. Efficiency — Operational218. Audit Theory219. Confirmations300. Managerial301. Transfer Pricing302. Breakeven303. Bud6eting & Planning304. Relevant Costs305. Responsibility Accounting306. Cost Allocations307. Capital Budgeting308. Tax (Tax Planning)309. Overhead Allocations310. HRA/Social Accounting311. Variances312. Executive Compensation400. Other401. Submissions To The FASB Etc.402. Manager Decision Characteristics403. Information Structures (Disclosure)404. Auditor Training405. Insider Trading Rules406. Probability Elicitation407. International Differences408. Form Of Organization. (Partnership)409. Auditor Behavior410. Methodology411. Business Failure412. Education413. Professional Responsibilities414. Forecasts415. Decision Aids416. Organization & Environment417. Litigation418. Governance

H. Accounting Area1. Tax2. Financial3. Managerial4. Audit5. Information Systems6. Mixed

I. Geography1. Non-USA

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
";"
Original text:
Inserted Text
";"
Original text:
Inserted Text
"6"
Original text:
Inserted Text
" - "
Original text:
Inserted Text
"6"
Original text:
Inserted Text
"6"
Original text:
Inserted Text
"6"
Page 22: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

678

679

680681

682

683

684

685

686687

688

689

690

691692

693

694

695

696

697

698

699

700

701

702

703704

705

706

t22:1

t22:2

t22:3

t23:1

t23:2

t23:3

t23:4

t23:5

t23:6

t23:7

t23:8

t23:9

19V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

CTED P

RO

OF

2. USA3. Both

J. Objective1. Profit2. Not for Profit3. Regulated4. All

K. Applicability1. Immediate2. Medium Term3. Long Term

L. Foundation Discipline1. Psychology2. Sociology, Political Science, Philosophy3. Economics & Finance4. Engineering, Communications & Computer Sciences5. Mathematics, Decision Sciences, Game Theory6. Statistics7. Law8. Other Mixed9. Accounting

10. Management

Appendix II

II.A. Description of taxonomy classes: accounting area, treatment and mode of reasoning

RRE

707708

709

710

711

Accounting area Identifies the major accounting field the paper covers. The major fields are tax, financial, managerial,audit, and information systems.

Treatment Identifies the major factor or other accounting phenomena associated with or causing the informationtaxon. The treatment taxon will be the main predictor variable in the regression model in an empiricalstudy. Main subcategories are financial accounting methods, auditing, managerial and others.

Mode of reasoning Identifies the technique used to formally arrive at the conclusions of the study, either by quantitative orqualitative analysis. The quantitative subcategory includes various items, e.g. descriptive statistics,regression, ANOVA, factor analysis, non-parametric, correlations, and analytical.

NCO

Appendix III

III.A. Data mining algorithms

UClassification 1: Bayes classifier

Algorithms

BayesnetDMNB TexNaïve BayesNaïve Bayes multinomialNaïve Bayes multinomial updateableNaïve Bayes updateableComplement Naïve Bayes

(continued on next page)

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Page 23: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

UNCO

RRECTED P

RO

OF

t23:10 Classification 2: decision trees classifier

t23:11 Algorithms

t23:12 J48t23:13 J48 graftt23:14 LAD treet23:15 Random forestt23:16 Random treet23:17 REP treet23:18 Simple cartt23:19t23:20 Classification 3: rule based classifier

t23:21 Algorithms

t23:22 ZeroRt23:23 Ridort23:24 PARTt23:25 OneRt23:26 JRipt23:27 Decision tablet23:28t23:29 Classification 4: other miscellaneous algorithms

t23:30 Algorithms

t23:31 Classification viat23:32 Regressiont23:33 Multiclass classifiert23:34 Simple logistict23:35 SMOt23:36 Attribute selectedt23:37 Classifiert23:38 Baggingt23:39 Classification viat23:40 Clusteringt23:41 CV parameter selectiont23:42 Daggingt23:43 Decoratet23:44 ENDt23:45 Ensemble selectiont23:46 Filtered classifiert23:47 Gradingt23:48 Logit boostt23:49 Ensemble selectiont23:50 Filtered classifiert23:51 Gradingt23:52 Logit boostt23:53 Multi boost ABt23:54 Ensemble selectiont23:55 Filtered classifiert23:56 Gradingt23:57 Logit boostt23:58 Multi boost ABt23:59 Multi schemet23:60 ND

Appendix III (continued)

20 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Page 24: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

712

713

714

715t1:1

t1:2

t1:3

t1:4

t1:5

t1:6

t1:7

t1:8

t1:9

t1:10

t1:11

t1:12t1:13

t1:14

t1:15

t1:16

t1:17

t1:18

t1:19

t1:20

t1:21

t1:22

t1:23

t1:24t1:25

t1:26

t1:27

t1:28

t1:29

t1:30

t1:31

t1:32

t1:33

t1:34

t1:35t1:36

t1:37

t1:38

t1:39

t1:40

t1:41

t1:42

t1:43

t1:44

t1:45

t1:46

t1:47

t1:48

t1:49

t1:50

t1:51

t1:52

t1:53

t1:54

21V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

Appendix IV

IV.A. Preliminary analysis I: detailed classification results.

Table 12

UNCO

RRECTED P

RO

OF

Results of Phase I Keywords Study – Treatment Taxon – with Five-fold Cross Validation.

12.1. Classification 1: Bayes classifier

Algorithm Correctly classified

Bayes net 59.2%Complement Naive Bayes 29.6%DMNB Tex 59.2%Naive Bayes 58.16%Naive Bayes multinomial 56.12%Naive Bayes multinomial updateable 56.12%Naive Bayes updateable 58.16

12.2 Classification 2: decision trees classifier

Algorithm Correctly classified

J48 56.12%J48graft 56.12%LADTree 57.14%Random forest 57.14%Random tree 55.1%REP tree 59.18%Simple cart 60.25%LMT 56.12%NB tree 59.18%

12.3 Classification 3: rule based classifier

Algorithm Correctly classified

Zero R 59.18%Ridor 59.18%PART 55.1One R 56.12%JRip 57.14%NNge 46.94%Decision stump 57.14%FT 54%

12.4 Classification 4: miscellaneous

Algorithm Correctly classified

Logistic 58.16%Simple logistic 56.12%SMO 57.14K Star 57.14LWL 57.14Attribute selected classifier 57.14Bagging 59.18%Classification via clustering 59.18%Classification via regression 59.18%CV parameter selection 59.18%Dagging 59.18%Decorate 56.12%END 59.18%Ensemble selection 59.18%Filtered classifier 59.18%Grading 59.18Logit boost 55.1%

(continued on next page)

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"’"
Page 25: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

UNCO

RRECTED P

RO

OF

t1:55 Table 12 (continued)

t1:56 12.4 Classification 4: miscellaneous

t1:57 Algorithm Correctly classified

t1:58 Multi boost AB 57.14%t1:59 Multiclass classifier 57.14%t1:60 Multi scheme 59.18%t1:61 Nested dichotomies class balanced ND 57.14%t1:62 Data near balanced ND 59.18%t1:63 ND 59.2%t1:64 Ordinal class classifier 59.18%t1:65 Raced incremental logit boost 59.18%t1:66 Random committee 58.16%t1:67 Random subspace 59.18%t1:68 Rotation forest 57.14%t1:69 Stacking 59.18%t1:70 Stacking C 59.18%t1:71 Vote 59.18%

Table 13t2:1

t2:2 Results of Phase II – Treatment Taxon – Full Abstract Study.

t2:3 13.1 Classification 1: Bayes classifier

t2:4 Algorithm Correctly classified

t2:5 Bayes Net 69.5%t2:6 Complement Naive Bayes 74.01%t2:7 DMNB Text 68.36%t2:8 Naive Bayes 54.23%t2:9 Naive Bayes multinomial 72.88%t2:10 Naive Bayes updateable 54.24t2:11 SMO 58.75t2:12 Decision table 70.05t2:13t2:14 13.2 Classification 2: decision trees classifier

t2:15 Algorithm Correctly classified

t2:16 J48 66.67%t2:17 J48 graft 71.75%t2:18 LAD tree 65%t2:19 Random forest 69.5%t2:20 Random tree 51.41%t2:21 REP tree 65.54%t2:22 Simple cart 70.05%t2:23 BF tree 70.06%t2:24 FT 63.84%t2:25t2:26 13.3 Classification 3: rule based classifier

t2:27 Algorithm Correctly classified

t2:28 Zero R 48.6%t2:29 Ridor 63.28%t2:30 PART 62.71%t2:31 One R 65.54%t2:32 JRip 63.28%t2:33 Decision table 70.05%t2:34 Conjunctive rule 65.54%t2:35 NNge 58.76%t2:36 Decision stump 65.54%t2:37t2:38 13.4 Classification 4: miscellaneous

t2:39 Algorithm Correctly classified

t2:40 SimpleLogistic 66.67%t2:41 SMO 58.76%

22 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Page 26: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

UNCO

RRECTED P

RO

OF

Table 14t3:1

t3:2 Results of Phase II – Treatment Taxon – full abstract study with percentage split (66%).

t3:3 14.1. Classification 1: Bayes classifier

t3:4 Algorithm Correctly classified

t3:5 Bayes net 80%t3:6 Complement Naive Bayes 81.67%t3:7 DMNB text 68.33%t3:8 Naive Bayes 63.33%t3:9 Naive Bayes multinomial 80%t3:10 Naive Bayes updateable 63.34%t3:11 Naive Bayes multinomial updateable 80%t3:12t3:13 14.2 Classification 2: decision trees classifier

t3:14 Algorithm Correctly classified

t3:15 J48 65%t3:16 J48 graft 73.33%t3:17 LAD tree 50%t3:18 Random forest 70%t3:19 Random tree 38%t3:20 REP tree 80%t3:21 Simple cart 75%t3:22 BF tree 64%t3:23 FT 65%t3:24 Decision stump 76.7%t3:25 LMT 70%t3:26t3:27 14.3 Classification 3: rule based classifier

t3:28 Algorithm Correctly classified

t3:29 Zero R 53.33%t3:30 Ridor 61.67%t3:31 PART 68.33%t3:32 One R 77%t3:33 JRip 52%t3:34 Decision table 80%t3:35 NNge 60%t3:36t3:37 14.4 Classification 4: miscellaneous

t3:38 Algorithm Correctly classified

t3:39 Classification via regression 78.33%t3:40 Multiclass classifier 43.33%t3:41 Simple logistic 70%t3:42 SMO 56.67%t3:43 Attribute selected classifier 78.33%t3:44 Bagging 78.33%t3:45 Classification via clustering 51.67%t3:46 CV parameter selection 53.33%t3:47 Dagging 68.33%t3:48 Decorate 78.33%t3:49 END 81.67%t3:50 Ensemble selection 78.33%t3:51 Filtered classifier 80%t3:52 Grading 54%t3:53 Logit boost 73.4%t3:54 Multi boost AB 76.67%t3:55 Multi scheme 53.33%

23V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Page 27: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

UNCO

RRECTED P

RO

OF

Table 15t4:1

t4:2 Results of Phase I – accounting area – keywords study.

t4:3 15.1 Classification 1: Bayes classifier

t4:4 Algorithm Correctly classified

t4:5 Bayes net 60.33%t4:6 Naïve Bayes 69.116t4:7 Naïve Bayes multinomial 69.116t4:8 Naïve Bayes multinomial updateable 69.116%t4:9 Complement Naïve Bayes 45.35%t4:10t4:11 15.2 Classification 2: decision trees classifier

t4:12 Algorithm Correctly classified

t4:13 J48 68%t4:14 J48 graft 68%t4:15 Random forest 65.51%

Random tree 65.51%Simple cart 64.43%

15.3 Classification 3: rule based classifier

Algorithm Correctly classified

ZeroR 60%PART 52%JRip 61.32%Decision table 60.2762%Conjunctive rule 60.8287%Ridor 54%

Table 16t5:1

t5:2 Results of Phase II – Accounting Area – full abstract study with percentage split (66%).

t5:3 16.1 Classification 1: Bayes classifier

t5:4 Algorithm Correctly classified

t5:5 Bayes net 83.2%t5:6 Complement Naive Bayes 85.31%t5:7 Naive Bayes 80.42%t5:8 Naive Bayes multinomial 71.36t5:9t5:10 16.2 Classification 2: decision trees classifier

t5:11 Algorithm Correctly classified

t5:12 J48 74%t5:13 J48 graft 74%t5:14 Random forest 73%t5:15 Random tree 57%t5:16t5:17 16.3 Classification 3: rule based classifier

t5:18 Algorithm Correctly classified

t5:19 ZeroR 73%t5:20 Ridor 71.36%

PART 71.36%One R 71.36%JRip 74.33%

24 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Page 28: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

UNCO

RRECTED P

RO

OF

Table 17t6:1

t6:2 Results of Phase I — mode of reasoning keywords study.

t6:3 17.1 Classification 1: Bayes classifier

t6:4 Algorithm Correctly classified

t6:5 Bayes net 50.32%t6:6 Naïve Bayes 53.32%t6:7 Naïve Bayes multinomial 51.02%t6:8 Naïve Bayes multinomial updateable 54.62%t6:9 Complement Naïve Bayes 27%t6:10 DMNB text 53.32%t6:11t6:12 17.2 Classification 2: decision trees classifier

t6:13 Algorithm Correctly classified

t6:14 J48 56.11%t6:15 J48graft 56.11%t6:16 Random forest 53.89%t6:17 Random tree 48.89%t6:18 Simple CART 54.44%t6:19 BFTree 55.55%t6:20 Decision stump 55.55%t6:21 REP tree 53.33%t6:22t6:22 17.3 Classification 3: rule based classifier

t6:24 Algorithm Correctly classified

t6:25 Zero R 53.33%t6:26 PART 54.44%t6:27 JRip 54.16%t6:28 Decision table 54.16%t6:29 Conjunctive rule 56.42%t6:30 Ridor 52.22%t6:31 One R 56.62%

Table 18t7:1

t7:2 Results of Phase II — mode of reasoning full abstract study.

t7:3 18.1 Classification 1: Bayes classifier

t7:4 Algorithm Correctly classified

t7:5 Bayes net 40%t7:6 Complement Naive Bayes 48.12%t7:7 Naive Bayes 46.69%t7:8 Naive Bayes multinomial 47.05%t7:9 DMNB text 48.13%t7:10 Naïve Bayes multinomial updateable 47.05%t7:11 Naïve Bayes updateable 46.69%t7:12t7:13 18.2 Classification 2: decision trees classifier

t7:14 Algorithm Correctly classified

t7:15 J48 35.45%t7:16 J48 graft 36.61%t7:17 Random forest 42.35%t7:18 Random tree 31.58%t7:19 Simple CART 39.85%t7:20t7:21 18.3 Classification 3: rule based classifier

t7:22 Algorithm Correctly classified

t7:23 Zero R 40.21%

(continued on next page)

25V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Page 29: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

F

716

717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772

t7:24 Table 18 (continued)

t7:25 18.3 Classification 3: rule based classifier

t7:26 Algorithm Correctly classified

t7:27 Ridor 33.02%t7:28 PART 34.46%t7:29 JRip 43.45%t7:30 Conjunctive rule 40.93%

26 V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

UNCO

RRECTED P

RO

OReferences

Ashton RH. Human information processing in accounting. Studies in Accounting Research. Sarasota, Florida: American AccountingAssociation; 198217.

Badua FA. Pondering Paradigms: Tracing the Development of Accounting Thought with Taxonomic and Citation Analysis. Ph.D.DissertationRutgers University; 2005.

Badua F, Previts GJ, Vasarhelyi M. Tracing the development of accounting thought by analyzing content, communication, and qualityin accounting research over time. Account Hist J 2011;38:31–56.

Ball R. Index of empirical research in accounting. J Account Res 1971;9:1–31.Barki H, Rivard S, Talbot J. A keyword classification scheme for is research literature: an update. MIS Q 1993;17:209–26.Birnberg JG, Shields JF. Three decades of behavioral accounting research: a search for order. Behav Res Account 1989;1:23–74.Brown LD, Gardner JC, Vasarhelyi MA. An analysis of the research contributions of accounting, organizations and society, 1976–1984.

Acc Organ Soc 1987;2:193–204.Brown LD, Gardner JC, Vasarhelyi MA. Attributes of articles impacting contemporary accounting literature. Contemp Account Lit

1989;5:793–815.Brown LD, Gardner JC, Vasarhelyi MA. Accounting research directory: the database of accounting literature. New York: Marcus

Wiener Publishing; 1994.Chatfield M. The accounting review's first fifty years. Account Rev 1975;1:1–6.Chen H, Lynch KJ. Automatic construction of networks of concepts characterizing document databases. IEEE Trans Syst Man Cybern

1992;22:885–902.Chen H, Schatz B, Martinez J, Ng TD. Generating a domain-specific thesaurus automatically: an experiment on FlyBase. Working

Paper MAI-WPS 94–02: Center for Management of Information College of Business and Public Administration, University ofArizona; 1994.

Chen H, Yim T, Fye D. Automatic thesaurus generation for an electronic community system. J Am Soc Inf Sci 1995;46:175–93.Crawford RG. Automatic thesaurus construction based on term centroids. Can J Inf Libr Sci 1979;4:124–36.Crouch CJ. An approach to the automatic construction of global thesauri. Inf Process Manag 1990;26:629–40.Crouch CJ, Yang B. Experiments in automatic statistical thesaurus construction. Paper presented at the Proceedings of the Fifteenth

Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen Denmark;1992.

Dyckman TR, Zeff SA. Two decades of the journal of accounting research. J Account Res 1984;22:225–97.Felix WL, Kinney WR. Research in the auditor's opinion formulation process: state of the art. Account Rev 1982;57:245–71.Fisher IE, Garnsey MR, Goel S, Tam K. The role of text analytics and information retrieval. J Emerg Technol Account 2010;7:1–24.Fleming RJ, Graci SP, Thompson JE. The dawning of the age of quantitative/empirical methods in accounting research: evidence from

the leading authors of the accounting review, 1966–1985. Account Hist J 2000;27:43–72.Gangolly J, Wu Y-F. On the automatic classification of accounting concepts: preliminary results of the statistical analysis of

term-document frequencies. New Rev Appl Exert Syst Emerg Technol 2000;6:81–8.Garnsey MR. Automatic classification of financial accounting concepts. J Emerg Technol Account 2006;3:21–39.Garnsey MR, Fisher IE. Appearance of new terms in accounting language: a preliminary examination of accounting pronouncements

and financial statements. J Emerg Technol Account 2008;5:17–36.Gonedes N, Dopuch N. Capital market equilibrium, information production, and selecting accounting techniques: theoretical

framework and review of empirical work. J Account Res Suppl 1974:48–170.Hakansson N. Empirical research in accounting, 1960–70: an appraisal. In: Dopuch N, Revsine L, editors. Accounting Research:

1960–1970: A Critical Evaluation. Urbana: Center for International Education and Research in Accounting, University of Illinois;1973. p. 137–73.

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The Weka data mining software: an update. SIGKDD Explor2009;11.

Heck JL, Jensen RE. An analysis of the evolution of research contributions by the accounting review, 1926–2005. Account Hist J2007;34:109–41.

Hofstedt TR. A state-of-the art analysis of behavioral accounting research. J Contemp Bus 1975(Autumn):27–49.Hofstedt TR. Behavioral accounting research: pathologies, paradigms and prescriptions. Acc Organ Soc 1976;1:43–58.Hulth A. Improved automatic keyword extraction given more linguistic knowledge. Proceeding EMNLP '03 proceedings of the 2003

conference on empirical methods in natural language processing; 2003.Ijiri Y, Kinard JC, Putney FB. An integrated evaluation system for budget forecasting and operating performance with a classified

budgeting bibliography. J Account Res 1968;6:1–28.Just A, Meyer M, Schaffer U, Vasarhelyi MA, Chiu V. Promoting diversity in accounting research: an empirical study of the develop-

ment of the identity and intellectual structure of AOS from 1990 to 2007. Working Paper; 20121–45.Krippendorff K. Content analysis. Thousand Oaks, CA: Sage; 2004.Libby R, Lewis BL. Human information processing research in accounting: the state of the art. Acc Organ Soc 1977;2:245–68.

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"’"
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"M. R."
Original text:
Inserted Text
"‘"
Original text:
Inserted Text
"M. A."
Page 30: Page 1 of 2 - Rutgers Accounting Web | RAWaccounting.rutgers.edu/MiklosVasarhelyi/Resume Articles...2 V. Chakraborty et al. / International Journal of Accounting Information Systems

773774775776777778779780781782783784785786787788789790

791

792

27V. Chakraborty et al. / International Journal of Accounting Information Systems xxx (2014) xxx–xxx

RO

OF

Libby R, Lewis BL. Human information processing research in accounting: the state of the art in 1982. Acc Organ Soc 1982;7:231–85.Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. KDD'98; 1998.March ST, Smith GF. Design and natural science research on information technology. Decis Support Syst 1995;15:251–66.Meyer M, Rigsby JT. A descriptive analysis of the content and contributors of behavioral research in accounting 1989–1998. Behav

Res Account 2001;13:253–78.Nigam K, Miccallum AK, Thrun S, Mitchell T. Text classification from labeled and unlabeled documents using EM. Mach Learn

2000;39:103–34.Nobata C, Collier N, Tsu J. Automatic term identification and classification in biology texts. Proceedings of the Natu'ral Lang,lmgc

Pacific Rim Symposium; 1999.Salton G, Allan J, Singhal A. Automatic text decomposition and structuring. Inf Process Manage 1996;32:127–38.Tan P, Steinbach M, Kumar V. Introduction to data mining. Addison Wesley; 2005.Thompson CA, Califf ME, Mooney RJ. Active learning for natural language parsing and information extraction. Proceedings of the

International Conference on Machine Learning (ICML); 1999406–14.Trueswell JC, Tanenhaus MK, Garnsey SM. Semantic influences on parsing: use of thematic role information in syntactic ambiguity

resolution. J Mem Lang 1994;33:285–318.Tucker AB. A perspective on machine translation: theory and practice. Commun ACM 1984;27:322–9.Vasarhelyi MA, Bao DH, Berk J. Trends in the evolution of scholarly accounting thought: a quantitative examination. Account Hist J

1988;15:45–64.

UNCO

RRECTED P

Please cite this article as: Chakraborty V, et al, Automatic classification of accounting literature, Int JAccount Inf Syst (2014), http://dx.doi.org/10.1016/j.accinf.2014.01.001

Original text:
Inserted Text
"""
Original text:
Inserted Text
"’"
Original text:
Inserted Text
"""