Download - e-print e-mbryology
Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international physics community.
Professor Stevan Harnad, University of SouthamptonDr. Les Carr, University of Southampton
Tim Brody, Ian Hickman
Open Citation Project - http://opcit.eprints.org/
e-print e-mbryology
Growth of the LANL Archive
• Since the start of the archive in 1991 its usage has been steadily growing
• Now, after 10 years, it has over 130,000 papers
Deposit Frequency
0
500
1000
1500
2000
2500
3000
1991
07
1992
01
1992
07
1993
01
1993
07
1994
01
1994
07
1995
01
1995
07
1996
01
1996
07
1997
01
1997
07
1998
01
1998
07
1999
01
1999
07
2000
01
Month
Dep
osit
Fre
quen
cy
LANL Authors
• Number of unique names identified in each year
• *Before 1995 author meta-data is missing in most sub-fields
1991* 4111992* 11521993* 14391994* 59581995 151981996 177621997 223591998 277851999 326732000-06 19593
Number of Identified Authors per Year
Citation Identification
• Based on automatic extraction and identification from the document source (Adobe Acrobat - .pdf)
• We have defined terminology for two types of citation:– “Red-Link”, an author cites a LANL pre-print article using a LANL
reference (e.g. hep-th/0006010)
– “Orange-Link”, an author cites a post-print, published article that is also deposited in the archive (e.g. Phys. Rev. D56 6588 (1997))
• Identifies (Red + Orange) 600,000 citations of 3,000,000 total citations from 130,000 papers
% (red+orange) % all (3083763)Red 259437 40.02% 8.41%Orange 388904 59.98% 12.61%Total 648341 100.00% 21.02%
Identification Ratio
• Currently 25% of citations are being identified
Citation Validation - Identification Ratio Against Time
0
0.05
0.1
0.15
0.2
0.25
0.3
1991
08
1992
02
1992
08
1993
02
1993
08
1994
02
1994
08
1995
02
1995
08
1996
02
1996
08
1997
02
1997
08
1998
02
1998
08
1999
02
1999
08
2000
02
Month
Iden
tifie
d C
itatio
ns R
atio
Red Links/Total Citations Orange Links/Total Citations Total Identif ied/Total Citations
Orange/Total (Trend) Red/Total (Trend) Identif ied/Total (Trend)
Identification Ratio - hep-th
• Currently 40% of citations from hep-th (High Energy Physics - Theory) papers are directly citing pre-print articles in LANL
Citation Validation - Identification Ratio Against Time (hep-th)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
199108
199202
199208
199302
199308
199402
199408
199502
199508
199602
199608
199702
199708
199802
199808
199903
199909
200003
Month
Identif
ied C
itatio
ns R
atio
Red (121241 - 21.04%) Orange (85202 - 14.79%) Identif ied (206443 - 35.83%)
Citation Latencies
• The raw data show that the latency of the citation peak has been reducing over the period of the archive
Frequency of Citation Latencies: 1992-1999
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 12 24 36 48 60 72 84 96
Time Difference/Months
Cita
tions
99 98 97 96 95 94 93 92
Normalised and Scaled Graph of Citation Latency: 1992-1999
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 12 24 36 48 60 72 84 96
Time Difference/Months
Ref
eren
ces
99 98 97 96 95 94 93 92
Citation Latencies
• Normalised data are corrupted by an artefact in the citation ratios (used to adjust for time)
Updates to LANL Pre-prints
• The LANL archive allows authors to update articles that they have deposited
Multiple Updates by LANL Subfield(based on LANL meta-data)
adap-orgastro-ph
chao-dyncomp-gas
cond-matcs
gr-qchep-ex
hep-lat
mathmath-ph
nlinnucl-ex
nucl-thpatt-sol
physicsquant-ph
solv-int
hep-thhep-ph
0 5000 10000 15000 20000 25000
No. of Papers w ith Updates
No Updates 1 Update 2 Updates 3 Updates 4 Updates
Update Frequency against Time (normalised)(based on moving average over 3 points)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0 7 14 21 28 35 42 49 56 63 70
Time Difference (days)
Fre
quen
cy (
Pap
ers)
1st Update, trend 2nd Update, trend 3rd Update, trend 4th Update, trend
Update Delay
• There are too few values to provide an accurate frequency so a trend must estimated
hep-th
0
25
50
75
100
125
150
175
20019
9107
1992
01
1992
07
1993
01
1993
07
1994
01
1994
07
1995
01
1995
07
1996
01
1996
07
1997
01
1997
07
1998
01
1998
07
1999
01
1999
07
2000
01
Pap
ers
With J-R With J-R/Report Report Unknow n
Article Embryology
• Papers with a journal reference [J-R] cross papers without a J-R at an age of 13 months, suggesting a time difference of 13 months between pre-print and post-print
Article State by Sub-field
• Self-professed state of the article (hep is updated by SLAC/SPIRES)
Retrospective Paper State by LANL Subfield
adap-orgastro-ph
chao-dyncomp-gas
cond-matcs
gr-qchep-ex
hep-lathep-ph
hep-thmath
math-phnlinnucl-ex
nucl-thpatt-sol
physicsquant-ph
solv-int
0 2500 5000 7500 10000 12500 15000 17500 20000 22500 25000Papers
With J-R With J-R/Report Reports To Appear Submitted Accepted Thesis Other
Does Author Impact effect the state of articles?
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
All (132218) Low (38.66%) Medium (22.23%) High (2.62%)
Author Impact Level (total papers)
Accepted
J.Ref
J.Ref/Report
Report
Review
Submitted
Unknown
State of Cited Articles
• Broken down by papers written by authors with given impact level
• Author impact determined by “Red-Link” citations
Author Deposit Rates
• 50% of deposits of new papers occur within 4 months of the author’s previous paper
Author Frequency of Deposits
0
5000
10000
15000
20000
25000
30000
35000
0 6 12 18 24 30 36 42 48
Time (Months)
Dep
osits
0
22000
44000
66000
88000
110000
132000
154000
Cum
ulat
ive
Real Time Cumulative
Author Impact Analysis• There is a co-author list for each paper
• Author impact is defined as the number of citations an author receives divided by the number of papers that author has deposited (the mean number of citations for an author)
• By applying this to each author, a list of author names with their impact is constructed
• The authors are ranked by their impact
• The set of authors is then divided into three impact sets; lowest 25%, middle 50% and highest 25%.
Authors
Impa
ct High
M ediumLow
Author Impact Quartiles
• High impact authors update more than medium or low
• High and medium impact authors deposit more papers than low
Quartile Total % Total Citations PapersCitations/Aut
hor/PaperDeposits
Mean Updates/Author
High 25% 798 2.09% 240,092 2,732 0.11 6,720 0.48Med 50% 9,262 24.20% 733,272 37,318 0.00212 93,671 0.37Low 25% 28,211 73.71% 251,925 67,951 0.000131 165,971 0.27
Cumulative Paper Frequency, by Author Impact
0
5000
10000
15000
20000
25000
30000
35000
0 20 40 60 80 100 120 140 160 180 200 220 240
Citations
Pap
ers
0
500
1000
1500
2000
2500
3000
Pap
ers
(Hig
h Im
pact
)
Medium (27317 - 44.91%) Low (30981 - 50.93%) High (2533 - 4.17%)
Author’s Papers
• There is no or little occurrence of the single high impact paper for the low impact author
Histogram of Citations per Paper(author impact) 30,000 papers were by authors w ith no citation
1386534 6072 5863
9627
30807
13668 11527
6784
3105
1797121 24925717047814441
2060
0
5000
10000
15000
20000
25000
30000
35000
40000
No citations 1 Citation 2/3 Citations 4/5/6Citations
7/8/9/10Citations
11 or moreCitations
Pap
ers
High (2.53%) Medium (34.55%) Low (62.92%)
Citation Spread
• A small number of papers receive a very large number of citations