1
Digital repositories and versions of academic papers
Frances Shipsey and Louise Allsop, VERSIONS Project
Library, London School of Economics and Political Science
ALISS Christmas Special: Libraries and Open Access Scholarship
British Library Conference Centre, 11 December 2006
The VERSIONS Project (www.lse.ac.uk/versions)
• VERSIONS: Versions of Eprints – user Requirements Study and Investigation of the Need for Standards
• Funded by the Joint Information Systems Committee (JISC) under the Digital Repositories Programme
• London School of Economics and Political Science (LSE) lead partner ; Nereus – consortium of European research libraries specialising in economics – associate partners
• Eprints – Economics – European
• July 2005 to February 2007
Nereus – a network of European economics research libraries www.nereus4economics.info
Economists Online – a pilot search service - http://nereus.uvt.nl/eo
Economists Online – institutional pages
Economists Online – an author page
Economists Online – content is stored in Nereus partners’ institional repositories
LSE institutional repository cover sheet
Notes indicate that differences between this version and published version remain
Focus on economics
• Established preprint culture – working papers and use of RePEc archive – discipline is already open access?
• Sue Sparks report on disciplinary differences:
‘What is the single most essential resource you use, the one that you would be lost without?’ Economists responded:
• 18.2% preprints• 9.1% postprints• 54.5% journal articles• 18.2% datasets
Sue Sparks. JISC Disciplinary Differences Report. Rightscom Ltd, August 2005. Appendix C, Table 43. http://www.jisc.ac.uk/uploaded_documents/Disciplinary%20Differences%20and%20Needs.doc
Issues regarding versions
• Do authors have the ‘usable’ version to deposit in IRs?• Can they produce it (easily) on request?• Level of awareness about publisher permissions• What are their attitudes towards making these versions
publicly available?• Any differences between UK and other European
countries regarding population of repositories• Experience of researchers – is it a problem sorting
through multiple versions?• Citing other authors’ work - issues
VERSIONS Project – User requirements study 2006
• Online survey ‘Versions of academic papers online – the experience of authors and readers’, conducted May-July 2006
• 464 responses from academic researchers• 76% of researcher respondents from economics and
econometrics• A variety of roles, from PhD student through to
professor• Good geographic spread• In addition, 133 responses from stakeholders - separate
survey, not covered in as much detail here
VERSIONS Survey of academic researchersCreation, storage and dissemination of versions
• Research active (50% wrote 4 or more papers in the past 2 years)• Interviews with researchers showed very large numbers of
revisions being produced and kept (as many as 60 or 70 in some cases)
• More difficult for researchers to retrieve older papers• May be left in a previous institution• Different servers, PCs and other storage media – dispersal• Problems with older software packages• Some much older material not available in digital form• Drafts and revisions not clearly labelled, so researcher cannot now identify
wanted version
How many versions do researchers produce?
• Researchers regularly produce numerous outputs from a single research project
• 59% typically produce 4 or more different types of research output per project, 33% produce 5 or more (Question 4)
57
408363
273
153
6619 6 1
050
100150200250300350400450
Which types of output do researchers produce?
• Journal articles are the most common output, with a wide range of others preceding, accompanying or following them (Question 4) 0
50
100
150
200
250
300
350
400
450
500
Conference paper
Presentation
Working paper-no quality control
Working paper-quality control
Mem
ber series-NB
ER
,IZA
Journal article-refereed
Journal article-unrefereed
Report for funding body
Book chapter
Book
Dataset
Thesis
Other
Which versions do researchers keep?
The majority of respondents personally keep / plan to keep major, but not all, revisions of their research papers stored in electronic form (e.g. computer or network drive) at the end of the process:
36%
54%
8%2%
0% Keep all revisions
Keep majorrevisions but not all
Keep the latestrevision that Iworked on only
Other
Do not keep
Permanent storage by authors of multiple versions of their journal articles
VERSIONS survey of researchers Q7: ‘Which of the following versions of a paper, that you have written for publication in a refereed journal, would you personally keep (eg on your own computer or network drive)?’
Revision stage Percentage of respondents who keep this stage permanently
Number of respondents who keep this stage permanently
Early draft version(s) before circulating to anyone, other than co-authors
39.9% 185
Draft version circulated to colleagues or peers for feedback before submission
53.9% 250
Version submitted to a journal for peer review 78.9% 366
Final author version produced by yourself/co-authors – agreed with the journal following referee comments
90.7% 421
Proof copy (publisher-produced version) 62.5% 290
Final published version (publisher-produced PDF) 91.8% 426
Satisfaction with management of personal versions
Survey respondents were split between satisfaction and dissatisfaction with the organisation of their own revisions and versions, on their own computers or storage mediums (Question 9):
49%48%
3%
0%
Yes
No, not completely
Don't know
Don't produce researchoutputs
Personal version management - examples
• Systems detailed include dating in file name, version control by number, by software system, and retaining the latest version only.
‘Every filename includes a date e.g. hello111206. That way it is easy to find the latest version among co-authors and for myself’.
‘I use version numbers e.g. “paper 2.1.doc”, changing the second number with each edit of any significance and the first number if there is a milestone in the process – team review / change of direction etc. I keep the milestone versions in a backup folder within the main folder that the document is developed in.’
‘Version control system – CVS or Subversion.’
‘I usually throw away everything as soon as I have a new version. Unless, for example, the new version is in another language or has some substantial changes in it, so that I may need the first version for some other purpose.’
Examples - continued• Problems cited include changing between computers, hoarding too many
versions, management of co-authored papers, maintaining an awareness of differences between versions, insufficient naming systems and accidental editing of the wrong versions ‘I am using two different computers for my work. As a result, sometimes I have my work at different stages on the two computers. I would like to find an easy way to get both systems up to date at any point in time.’
‘I think that I keep too many old versions. I like to keep several while working on the paper in case files get corrupted etc. but I seldom go back afterwards and delete all unimportant versions.’
‘The problem is that co-authors sometimes do revisions on the wrong version. We don’t agree which is the latest version.’
‘What I am not so efficient at is distinguishing between versions with limited differences, and those where substantial changes have been made.’
‘I do not have a consistent renaming system. This causes major problems in finding the correct version after a long period.’
Responsibility for secure storage of different versions (Questions 10-13)
0
50
100
150
200
250
300
350
400
450
500
Early
dra
ft
vers
ion
(s)
Dra
ft to
co
lleag
ues/p
eers
Su
bm
itted
to
jou
rnal-p
eer
revie
w
Ag
reed
with
jou
rnal p
ost-
refe
reein
g
Pu
blis
her p
roo
f
Fin
al p
ub
lish
ed
vers
ion
(ofte
n
PD
F)
No
ne o
f these
Do
n't k
no
w
Authors/Co-authors Authors' institutions (inc. libraries) Publishers Subject repositories
Dissemination of different versions
• In addition to refereed academic journals and university/institutional collections, respondents disseminate their full text research findings through a range of other channels (Question 17):
- Personal website / homepage [301 respondents]
- University website for working paper / discussion paper series [256 respondents]
- REPEC (IDEAS, EconPapers) [209 respondents]
- SSRN [181 respondents]
- Other [60 respondents]
Which versions?
• Respondents were asked which versions of their academic papers they were interested in making openly accessible, if permitted (Question 19):
116191
274
100
385
3 30
50100150200250300350400450
Draft to
colleag
ues/p
eers
Version
for p
eerreview
Fin
al version
,p
ost-refereein
g
Pu
blish
er pro
of
Fin
al pu
blish
edversio
n(o
ftenP
DF
)
Do
n't kn
ow
Do
n't p
rod
uce
Copyright awareness
• The survey revealed significant uncertainty relating to copyright issues among its respondents:
53.7% of respondents reported limited or no understanding of which version(s) of academic papers, intended for publication in refereed academic journals, they are allowed to disseminate in full text, in which locations, and at which times
46
136
182
40
10
0
50
100
150
200
Full understanding Someunderstanding
Limitedunderstanding
No understanding Don't know
Num
ber o
f res
pond
ents
Copyright awareness in relation to repositories
• 68.3% of researchers stated that they were unsure whether publisher copyright agreements permit them to place final author versions into institutional repositories:
Question 16a, ix) To what extent do you agree with the following: ‘I am unsure whether the publisher copyright agreement permits me to provide this version [final author version for use in an institutional repository]’?
138
179
36 36
72
30
50
100
150
200
Stronly agree Slightly agree Slightlydisagree
Stronglydisagree
Don't know Don't producepapers
Num
ber o
f res
pond
ents
Citing versions in the face of change
• Respondents were asked how they prefer to cite earlier versions of papers that have subsequently been published in journals (Question 24):
12
22
58
33
339
0 50 100 150 200 250 300 350 400
Don't know
Do not cite any version of the paper if I have notread the published version
Cite earlier author version that I have read
Cite published version and earlier author version thatI have read
Cite published version only
Responses indicate that many researchers spend time reading both versions to ensure no major differences
Multiple versions – experience of readers
• 93% of respondents reported finding more than one full text version / copy of a paper online (Question 22)
• When asked whether it is generally quick and easy to establish which version(s) they wish to read, respondents answered as follows (Question 23):
54%41%
5%
Yes
No
Do not find multiple versions
Recent projects and initiatives on versionsOngoing standards development work:• NISO/ALPSP Working Group on Versions of Journal Articles
- Publisher-led group, with larger review group made up of publishers, librarians and other stakeholders- Draft documents including Terms and Definitions for versions (March 2006): Author’s Original, Accepted Manuscript, Proof, Version of Record, Updated Version of Recordhttp://www.niso.org/committees/Journal_versioning/JournalVer_comm.html
Two JISC activities during 2006:• RIVER – Scoping Study on Repository Version Identification
- Led by Rightscom Ltd, with partners London School of Economics and Political Science Library, University of Oxford Computing Services- Defined two broad classes of requirement for version identification (Collocation and Disambiguation), and defined a tentative typology of ‘versions’ (March 2006)http://www.jisc.ac.uk/uploaded_documents/RIVER%20Final%20Report.pdf
• JISC Eprints Application Profile Working Group- Approach based on Functional Requirements for Bibliographic Records (FRBR) and DCMI Abstract Model, more detail and structure than Dublin Core (June – August 2006) – work going forward through DCMI Task Group- Deals with versions very wellhttp://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile
How can social science librarians advise academic authors?• Provide information to researchers about permitted use of different
versions of journal articles through the SHERPA/RoMEO database: www.sherpa.ac.uk/romeo.php
• Explain the term ‘postprint’ in researchers’ own language: eg ‘accepted version’ or accepted manuscript’
• Strongly encourage researchers’ to keep these ‘accepted manuscript’ versions (in Word as well as in PDF) and to obtain them from co-authors if they don’t have the latest
• Encourage use of date in authors’ manuscript versions (date manuscript completed)
• Provide general information about depositing papers • BOAI Self-Archiving FAQ - http://www.eprints.org/openaccess/self-faq/ • Peter Suber’s Open Access Overview -
http://www.earlham.edu/~peters/fos/overview.htm
• If appropriate, provide advice about ongoing management of versions – consider whether this can be incorporated into training for research students
How can librarians improve version identification in their IRs?• Store metadata in richer format than simple Dublin Core, for
example Eprints Application Profile• Add version identification metadata• Consider use of cover sheets (some search engines take users
directly to the full text document, bypassing metadata)• When evaluating repository software, include version identification
in criteria• Look at RIVER report recommendations to universities (IR
managers)• Monitor JISC work on version identification and future guidelines• Encourage your library to develop a policy on version identification
in the IR
Reproduced by permission of the National Library of Ireland and X Communications