resolving an anomaly distributed eprints archives and scientometrics

23
Resolving an Anomaly www.eprints.org Distributed Eprints Archives and Scientometrics

Upload: helen-townsend

Post on 27-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Resolving an Anomalywww.eprints.org

Distributed Eprints Archives and Scientometrics

Resolving an Anomalywww.eprints.org

H. G. Wells, World Brain: The Idea of a Permanent World Encyclopaedia

Encyclopédie Française, August, 1937

• Encyclopaedias of the past sufficed for the needs of a cultivated minority– universal education was unthought of– gigantic increase in recorded knowledge

• Discontent with the role of universities and libraries in the intellectual life of mankind

• Universities multiply but do not enlarge their scope– thought & knowledge organization of the world

• No obstacle to the creation of an efficient index to all human knowledge, ideas and achievements

Resolving an Anomalywww.eprints.org

The Optimal and Inevitable for ResearchersAll of this will come to pass. The only question is “How Soon?”

• The entire full-text refereed corpus online

• On every researcher’s desktop, everywhere

• 24 hours a day

• All papers citation-interlinked

• Fully searchable, navigable, retrievable

• For free, for all, forever

Resolving an Anomalywww.eprints.org

Globalizing ResearchImpact Access

The Rest

The Rest

HarvardHarvard

financial firewalls

Resolving an Anomalywww.eprints.org

The Subversive Proposal: Sufficient to free entire refereed corpus forever, immediately:

1. Universities install off-the-shelf, OAI-compliant Eprint software

2. Authors self-archive (preprints & postprints)

3. Institutions subsidize first start-up wave of self-archiving

4. The Give-Away corpus is freed

___________________________________________________________

Hypothetical Sequel:

5. Users prefer free version?

6. Publisher S/L/P revenues shrink, Library S/L/P savings grow?

7. Publishers downsize to QC/C service-providers + optional add-ons?

8. QC/C service costs funded by author-institution out of reader-institution S/L/P savings?

Resolving an Anomalywww.eprints.org

Five Essential PostGutenberg Distinctions:(if you don’t make them, none of this will make sense)

1. Distinguish the non-give-away vs. give-away literature Litmus test: “Does the author seek a royalty/fee?”: books (yes) vs. refereed journal papers (no)

2. Distinguish income (from paper sale) vs. impact (from paper use)

(and distinguish give-away-author imprint-income [0] vs. impact-income [??])

3. Distinguish give-away author copyright protection from: theft-of-authorship (wanted) vs. theft-of-text (unwanted)

4. Distinguish self-publishing (vanity press) vs. self-archiving (of published, refereed research)

5. Distinguish unrefereed preprints vs. refereed postprints “eprints” = preprints + postprints

Resolving an Anomalywww.eprints.org

Zeno’s Prima-FaQs“I worry about self-archiving because…”:

1. Preservation

2. Authentication

3. Corruption

4. Navigation (info-glut)

5. Certification

6. Evaluation

7. Peer review

8. Paying the piper

9. Downsizing

10. Copyright

11. Plagiarism

12. Priority

13. Censorship

14. Capitalism

15. Readability

16. Graphics

17. Publishers’ future

18. Libraries’ future

19. Learned Societies’ future

20. University conspiracy

21. Serendipity

22. Tenure/Promotion

23. (your prima-FaQ here…)

Answers available at < http://cogsci.soton.ac.uk/~harnad/Tp/resolution.htm >

www.eprints.org

Eprints < www.eprints.org > is dedicated to freeing the research literature, pre- and post-refereeing, through author/institution self-archiving in interoperable

Open Archives < www.openarchives.org >

To help the self-archiving initiative quickly gain momentum, archive-creating software, compliant with the OAi protocol, hence fully interoperable with all other Open Archives, has been developed at the University of Southampton.

Eprints is designed to be as flexible and adaptable as possible, so that all universities world-wide can immediately adopt and configure it with minimal effort for all their disciplines’ self-archiving needs.

The Eprints software, has been available (for free, of course) from eprints.org since December 2000.

Resolving an Anomalywww.eprints.org

arXiv submission rates - linear growth only

30% of citations to papers deposited in arXiv

From Linear Growth to Exponential

Time

DepositRates

Exponential growth in archiving to catch up with paper-based research

100% of papers archived, in all disciplines

Discipl

ines

Resolving an Anomalywww.eprints.org

Well’s Global Research Database?

Resolving an Anomalywww.eprints.org

Multiple Updates by LANL Subfield(based on LANL meta-data)

adap-orgastro-ph

chao-dyncomp-gas

cond-matcs

gr-qchep-ex

hep-lat

mathmath-ph

nlinnucl-ex

nucl-thpatt-sol

physicsquant-ph

solv-int

hep-thhep-ph

0 5000 10000 15000 20000 25000

No. of Papers with Updates

No Updates 1 Update 2 Updates 3 Updates 4 Updates

Citation Linking& Scientometric Analysis

New OAI Services

Resolving an Anomalywww.eprints.org

Citation-Ranked Searches

Resolving an Anomalywww.eprints.org

Citation-based Visualisation

Resolving an Anomalywww.eprints.org

• The raw data show that the latency of the citation peak has been reducing over the period of the archive

Frequency of Citation Latencies: 1992-1999

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 12 24 36 48 60 72 84 96

Time Difference/Months

Cita

tions

99 98 97 96 95 94 93 92

Decreasing Citation Latencies

Resolving an Anomalywww.eprints.org

The “New Paper Rush”

• Users subscribe to an email alerting service that informs them of new papers.

Age of paper against number of downloads

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

Age of Paper (days)

Nu

mb

er o

f D

ow

nlo

ads

Resolving an Anomalywww.eprints.org

hep-th

0

25

50

75

100

125

150

175

20019

9107

1992

01

1992

07

1993

01

1993

07

1994

01

1994

07

1995

01

1995

07

1996

01

1996

07

1997

01

1997

07

1998

01

1998

07

1999

01

1999

07

2000

01

Pap

ers

With J-R With J-R/Report Report Unknow n

• Papers with a journal reference [J-R] cross papers without a J-R at an age of 13 months, suggesting a time difference of 13 months between pre-print and post-print

Article Embryology

Resolving an Anomalywww.eprints.org

• The papers were split into three sets based on the number of citations to them.

• There are an equal number of citations to the papers in the low, medium and high sets.

Effect of Paper Impact

Resolving an Anomalywww.eprints.org

• High impact authors update more than medium or low

• High and medium impact authors deposit more papers than low

Quartile Total % Total Citations PapersCitations/Aut

hor/PaperDeposits

Mean Updates/Author

High 25% 798 2.09% 240,092 2,732 0.11 6,720 0.48Med 50% 9,262 24.20% 733,272 37,318 0.00212 93,671 0.37Low 25% 28,211 73.71% 251,925 67,951 0.000131 165,971 0.27

Author Impact Quartiles

Resolving an Anomalywww.eprints.org

Citation Quality

• Papers generally cite papers of like impact (2 underway).

High

Medium

Low

LowMedium

High

0

20000

40000

60000

80000

100000

120000

140000

No of Citations

Dest. Impact

Source Impact

Do Papers Cite Papers of Like Impact

Resolving an Anomalywww.eprints.org

Histogram of Citations per Paper(author impact) 30,000 papers were by authors w ith no citation

1386534 6072 5863

9627

30807

13668 11527

6784

3105

1797121 24925717047814441

2060

0

5000

10000

15000

20000

25000

30000

35000

40000

No citations 1 Citation 2/3 Citations 4/5/6Citations

7/8/9/10Citations

11 or moreCitations

Pap

ers

High (2.53%) Medium (34.55%) Low (62.92%)

• A small number of papers receive a very large number of citations

Citation Spread

Resolving an Anomalywww.eprints.org

Effect of Paper Impact on Usage

• Higher impact papers have a longer download life expectancy.

All Papers

0

0.0005

0.001

0.0015

0.002

0.00250

109

218

327

436

545

654

763

872

981

1090

1199

1308

1417

1526

1635

1744

1853

1962

2071

2180

2289

2398

Age of paper (days)

Fre

qu

ency

Den

sity

High (2.0%) Medium (7.7%) Low (46.5%) Unknown (39.6%)

Resolving an Anomalywww.eprints.org

Correlating citations and downloads

• There is a significant positive correlation between citations and downloads for high impact papers.

Download type r nAll Papers 0.11155 63671

High Impact Papers (2.0%) 0.27293 1981Medium Impact Papers (7.7%) 0.01288 5937

Low Impact Papers (46.5%) -0.01412 30163

Resolving an Anomalywww.eprints.org

Implementation Issues

• Creating new metadata

vs

• Creating new services