study: plagiarism in medical residency applications
TRANSCRIPT
Plagiarism in residency application essays
Scott Segal, MD, MHCM
Brian J. Gelfand, MD
Shelley Hurwitz, PhD
Lori Berkowitz, MD
Stanley W. Ashley, MD
Eric S. Nadel, MD
Joel T. Katz, MD
Dr. Segal and Dr. Gelfand contributed equally to the study.
From Brigham and Women's Hospital, Harvard Medical School, Boston, MA,
Departments of Anesthesiology, Perioperative and Pain Medicine (SS, BJG), Internal
Medicine (JTK, SH), Emergency Medicine (ESN; also Massachusetts General Hospital
Department of Emergency Medicine), Obstetrics and Gynecology and Reproductive
Endocrinology (LB; also Massachusetts General Hospital Department of Obstetrics,
Gynecology and Reproductive Biology), and Surgery (SWA). Address correspondence
and reprint requests to Dr. Segal at Department of Anesthesiology, Perioperative and
Pain Medicine, Brigham and Women's Hospital, 75 Francis St., Boston, MA, 02115 or at
Word count: 3122 (text only)
Abstract word count: 270
1
Abstract
Background: Anecdotal reports suggest some residency application essays contain
plagiarized content.
Objective: To determine the prevalence of plagiarism in a large cohort of application
essays.
Design: Retrospective cohort study of residency application essays.
Setting: Single large academic medical center.
Participants: Essays submitted as part of applications to five large residency programs
at the same academic medical center between 9/1/2005 and 3/22/2007 in
anesthesiology, emergency medicine, internal medicine, obstetrics and gynecology, and
surgery (N=4975).
Measurements: Essays were compared to a database of Internet pages, published
works, and previously submitted essays by specialized software that returns a score of
0-100, representing the percentage of the submission matching another source. An
essay matching more than 10% to existing work was defined as evidence of plagiarism.
Results: Plagiarism was found in 5.2% (259/4975; 95% confidence interval [CI], 4.6%,
5.9%) of essays. Non-U.S. citizens’ essays were more likely to demonstrate evidence
of plagiarism; prevalence among U.S citizens was 1.8% (1.4%, 2.3%), among foreign
nationals 13.9% (odds ratio [OR] 8.7 [95% CI, 6.4, 11.8], P<.0001), among permanent
residents 13.6% (OR 8.5 [5.7, 12.5], P<.0001), and among others 9.1% (OR 5.4 [1.6,
18.1], P=.0065). After adjustment for citizenship, other characteristics of applicants
increasing the prevalence of plagiarism included medical school location outside the
U.S. and Canada, previous residency or fellowship, lack of research experience,
2
volunteer experience, or publications, low USMLE step 1 score, and nonmembership in
Alpha Omega Alpha honorary society.
Limitations: The software database is likely incomplete, the 10% match threshold for
defining plagiarism has not been statistically validated, and the study was confined to
applicants to one institution.
Conclusions: Although more common in international applicants, plagiarism in
residency application essays was found in all specialties, from applicants of all medical
school types, and even among those with significant academic honors.
3
All applicants to United States residency programs must complete an original
essay known as the “Personal Statement.” The format is free-form and the content is
not specified. Expectations may vary by specialty; common themes include description
of motivation for seeking training in a chosen specialty, clarification of factors impacting
suitability for a field or program, relaying a critical incident that impacted career choice
and circumstances that distinguish the applicant from others. In recent years,
numerous Web sites have offered advice and sample essays to assist applicants in this
effort. The National Resident Matching Program’s (NRMP) Electronic Residency
Application Service (ERAS) lists several such sites but cautions applicants to submit
only original, personal work (1). ERAS also warns applicants that “any substantiated
findings of plagiarism may result in reporting of such findings to the programs to which
[they] apply now and in the future (2). Applicants must certify that work is accurate and
original before an ERAS application is complete.”
Although anecdotal reports of personal statement plagiarism have appeared in
online discussion boards, no systematic investigation of the incidence of plagiarism has
been undertaken. The development of software that can efficiently search large
databases of previously created content has dramatically improved the ability to detect
plagiarized essays in other contexts. Using such software adapted for application
essays, we undertook an analysis of personal statements submitted with residency
applications in five specialties. The software returns a “similarity score” representing
the percentage of the submitted essay matching content in the database. The primary
goal of this investigation was to use this methodology to estimate the prevalence of
plagiarism in applicants’ personal statements at our institution, and determine its
4
association with demographic, educational, and experience-related applicant
characteristics.
5
METHODS
Participants
Personal statements in applications to residency programs at Brigham and
Women's Hospital in Boston, MA in five specialties, with planned starts in July 2007,
were analyzed for plagiarism. Applications were submitted between 9/1/2005 and
3/22/2007. Specialties studied were internal medicine, anesthesiology, general surgery,
obstetrics and gynecology, and emergency medicine, which are the five largest
residency programs at the institution. Because applicants had previously submitted the
material to the residency programs after attesting to its originality and veracity, and
because anonymity was assured by the study design, the institutional review board
approved the protocol and waived the requirement for informed consent from the
applicants.
Measurements and data collection
Essays were exported from the respective programs’ ERAS databases and de-
identified by removing the applicant name and application number in exchange for a
computer-generated random number. A single copy of the master key linking code
numbers to the applicants’ Association of American Medical Colleges (AAMC) number
was retained. Other applicant data (gender, age, medical school type and location,
volunteer, research, or work experience, publications, Alpha Omega Alpha membership,
United States Medical Licensing Examination [USMLE] scores, citizenship, self-reported
“language fluency other than English”, previous residency or fellowship, certification
status by the Educational Commission for Foreign Medical Graduates [ECFMG]) was
exported from the respective ERAS databases, matched with the computer-generated
code numbers using AAMC number and specialty of application as keys, and de-
6
identified. Only data exportable as categorical information from ERAS was used. Free
text fields, medical school and undergraduate college names, specific volunteer,
research, or work activities, individual language skills, honors other than AOA, hobbies
and interests, and information about interviews and ranking for the NRMP were not
analyzed.
Essays were analyzed for possible plagiarism by specialized software (Turnitin
for Admissions Essays, iParadigms, LLC, Oakland, CA). The program uses the
technique of digital fingerprinting to represent groups of characters in the source
document as numeric sequences which are then sampled in groups of approximately 40
characters. The fingerprint of the submitted essay is then compared to a database
including web pages, printed resources, and previously submitted essays (3, 4). The
algorithm used efficiently searches for exact or close matches within 40 character
strings but also rejects very common phrases (3). The system also ignores material in
the submitted essay enclosed in quotation marks. The software returns a “similarity
score” between 0 and 100, representing the percentage of the submitted essay that
matches to a source in the database. A report showing the original source material
matching the submitted essay and highlighting the matching passages is also returned.
The Turnitin system will match submitted essays to others previously submitted
(which are permanently incorporated into its database) and will show such matches
preferentially over matches to an outside source. It is not possible to identify the essay
matched to other than by the institution submitting it, and it was not possible to
determine if the matching material also matched an Internet or other outside source.
Therefore, the order of submission can affect the similarity scores returned, so the initial
similarity scores were adjusted to resolve matches to other essays within the submitted
7
cohort. For essays matching solely to one other essay also submitted by the
investigators, we determined the author of the essays by comparing the coded identities
to the applicant’s unique applicant number within ERAS but maintained the de-
identification in the submitted essays. If an applicant applied to more than one specialty
and matched his or her own essay but neither essay matched any external source, the
score for both essays was set to 0. If an applicant matched his or her own essay and
the essay matched to also matched an outside source, then the scores of both essays
were set to the lesser of the two scores originally reported for the applicant’s essays. If
an essay matched another in the submitted cohort from a different applicant, then
scores were kept as originally reported, although it was not possible to determine which
essay copied which or whether both copied an unidentified third source.
The similarity score distribution was highly skewed; 75% of scores were zero.
Among non-zero scores, the median and mean were 3.0 and 8.2. Therefore, the
similarity score was dichotomized (plagiarism or not plagiarism) for analysis. Two
independent investigators (B.G. and S.S.) examined the unadjusted similarity scores
and reports for 100 randomly selected essays in which the score was >0 to determine
approximate cutoff values to use to dichotomize the scores into likely and unlikely
plagiarism. Scores from 0-4 typically reflected matches to proper names of individuals
or institutions, or relatively common short phrases, and therefore were generally thought
to represent minimal chance of true plagiarism. Scores between 4 and 10 represented
a mixture of similar innocent matches and others that were judged likely to be
plagiarism because of the original source material to which the essay matched.
Subsequent inspection of all essays with scores ≥10 (N=259, including 41 from the
screening sample) by two observers (B.G. and S.S.) revealed no cases deemed to be
8
false positives by either observer. Plagiarism was therefore defined as a similarity
score ≥10 and all analyses were carried out using this cutoff value, consistent with other
application essay plagiarism detection systems using different software (5).
Statistical analysis
Baseline characteristics of the applicant pools across specialties were compared
with chi square test for categorical variables or Kruskal-Wallis test for age, essay length
and USMLE step 1 score. The proportion of similarity scores greater than or equal to
10 was calculated with 95% confidence intervals for the entire cohort and for each
applicant characteristic, and the ability of various individual variables available from the
ERAS databases to predict plagiarism was analyzed by logistic regression. Potential
collinearity was checked with diagnostics and examination of the variances of the
estimates. Multiple variable logistic regression was used to adjust for the influence of
US citizenship (yes/no) or US or Canadian medical school status (yes/no) in evaluating
the ability of each single variable to predict plagiarism. The results using US citizenship
and the results using US or Canadian medical school were very similar; the results
using US citizenship are reported here. Where the interaction between the variable and
US citizenship was significant, we report the subgroup results. All statistical tests were
two-sided, nominal P-values are reported, and SAS version 9.1 and JMP version 7.0
(both from SAS Institute, Cary, NC) were used.
Role of the Funding Source
No external funding source supported this study.
9
RESULTS
Applicant Characteristics
5010 applicant files were analyzed, and 4975 essays were screened for evidence
of plagiarism. Thirty-five files either had no essay or the essay could not be extracted
for technical reasons. The characteristics of the applicants within each specialty and for
the entire cohort are shown in Table 1. There were significant differences across
specialties for all abstracted features except for research experience.
Plagiarism
Plagiarism, defined as a similarity score greater than or equal to 10, was found in
5.2% (95% CI 4.6%, 5.9%) of essays overall (Table 2). The Figure shows excerpts
from two representative examples of plagiarized essays with scores of 10% and 71%.
The distribution of similarity scores for the entire cohort is shown in Table 3. Essay
length was weakly inversely associated with similarity score (adjusted r2=.007,
P=.0016).
Essays from international applicants, whether defined by citizenship, medical
school type, or medical school location, were significantly more likely to demonstrate
plagiarism (Table 2). Essays from applicants with membership in Alpha Omega Alpha
(AOA; an honorary society recognizing students in the top 10-20% of their medical
school class), research experience, volunteer experience, and higher USMLE Step 1
scores were less likely to demonstrate plagiarism (Table 2). Essays from older
applicants, those indicating language fluency other than English, and those with
previous residency training showed a significantly higher prevalence of plagiarism
(Table 2). Controlling for international applicant status eliminated the effect of age and
other language fluency and reduced the effects of previous residency training, research
10
experience, volunteer experience, and AOA membership (Table 2). Among applicants
from non-U.S. or Canadian medical schools, certification by the Educational
Commission for Foreign Medical Graduates (ECFMG) was associated with a
significantly higher prevalence of plagiarism (Table 2). Comparison of prevalence of
plagiarism across specialties showed only Obstetrics and Gynecology to differ from the
others (Table 2, AOR 1.6 [95% CI 1.1, 2.3], P=.0073).
Sources of material to which essays with similarity scores ≥10 matched are
shown in Table 4.
11
DISCUSSION
Plagiarism may be defined as “the action or practice of taking someone else's
work, idea, etc., and passing it off as one's own; literary theft” (6). The recognition of
falsified material on applications for postgraduate medical training has been previously
documented. Investigators have reported a 10-30% prevalence of misleading
publication citations, including reference to non-existent articles, journals, or authorship
(7-13). One small study of 26 applications to a single family practice fellowship found
three cases of plagiarism in the application essays (14). The present study is the first
report of widespread plagiarism in personal statements in residency applications.
The results of the present investigation suggest plagiarism in approximately one
in twenty applications. Plagiarism was seen in essays from applicants to all specialties,
from domestic and foreign medical schools, in both genders, and among applicants with
strong academic credentials. Although confined to a single institution, our sample
represents 28-45% of the total number of applicants nationwide in the studied
specialties (15). It is unclear whether applicants to the five competitive programs in our
institution are wholly representative of the overall applicant pool, but the large fraction
applying makes systematic bias unlikely.
Our finding of significant plagiarism in residency application essays is worrisome.
First, it is likely that residency selection committees would find misrepresentation on the
application to be a strongly negative indicator of future performance as a resident. The
ACGME has deemed professionalism to be one of the six core competencies to be
taught and assessed in undergraduate and graduate medical education (16). The
authors believe that program directors would find a breach of professionalism in the
application to be an unacceptable baseline from which to begin residency. Second,
12
lapses in professionalism in medical school (18) and residency training (19) can be
predictive of future disciplinary action by state medical boards. Third, increasing public
scrutiny of physicians’ ethical behavior is likely to put pressure on training programs to
enforce strict rules of conduct, beginning with the application process.
The growth of the Internet has been implicated in the mounting evidence for
widespread plagiarism in high school, undergraduate and graduate education (20, 21).
Besides making vast amounts of material available for “cut and paste” or “cyber-“
plagiarism, there is evidence that students view material available online in a different
light than material published in print form. Copying from the Internet is viewed by a
disconcertingly large fraction of students as acceptable practice and not true plagiarism
(22). In response, within the non-scientific fields and liberal arts communities, several
web based software solutions have been developed to detect intentional
misrepresentation. The largest such service (turnitin.com) has been utilized
successfully in thousands of undergraduate and graduate level universities worldwide
(23) and was selected for the present study. Only recently has this methodology been
applied to scientific material, but these efforts have already identified significant
episodes of plagiarism and duplicate publication within the biomedical literature (24).
Our study adapted the latest iteration of this technique to target application essays.
Our study was not designed to determine the best cutoff value in the similarity
score returned by the software for differentiating true plagiarism from innocent matches
to other documents. Inspection of a number of similarity reports (which show the
matching content from the submitted essay and source material in the turnitin.com
database) demonstrated no cases judged to be innocent matches when the score was
≥10. This value was used as the dichotomous cutoff for our definition of plagiarism in
13
the present study. A similar system and 10% cutoff is used in the U.K. as part of the
Universities and Colleges Admissions Service (UCAS) (5). Table 3 illustrates the
potential effect of other cutoff values on the prevalence estimate of plagiarism.
However, only a formal sensitivity analysis of varying cutoff values could determine an
optimal value. Unfortunately, this would require an agreed-upon gold standard for
plagiarism, which does not exist.
There are, therefore, a number of factors that could make our prevalence
estimate either too high or too low. An essay quoting extensive material as a block not
enclosed by quotation marks, multiple relatively short common phrases, or a long
citation of an individual’s title or institution, might match the database but represent
appropriate use of the source material. It was not possible for us to vary the 40-
character string length used by the software for matching to source material. In our
data, however, individual review of the 259 essays with a similarity score of ≥10
revealed no such instances felt to represent false positives. An essay previously
submitted by the author in a different setting or year might also match extensively to the
probed content, also yielding a false positive result. However, the Turnitin system had
not been used by any other residency programs prior to our study (personal
communication, Jeffrey Lorton, iParadigms, LLC).
Conversely, many factors may have led us to underestimate the true prevalence
of plagiarism. First, the turnitin.com database is likely incomplete. The database was
originally conceived to screen undergraduate and arts and science graduate school
applications, not medical school or residency applications. Furthermore, applicants
plagiarizing from other applicants’ essays that are not in the public space, or previously
submitted for analysis, would be missed by the software. This could include material
14
found only in books, essays passed directly between candidates, or essays submitted
by others in prior years. Second, close paraphrasing involving careful word substitution
may represent plagiarism but nonetheless escape detection by the software. Finally,
the authors found numerous examples of unattributed use of material, which was
strongly felt to be plagiarism, but which did not reach the 10% similarity score threshold.
Therefore, we believe that our prevalence estimate is conservative, and plagiarism is
likely more common than we report.
There are other limitations of our study. First, the detection software is limited by
the incorporation of submitted material into its database and its preference for showing
matches to other essays in the database instead of external sources. This may lead to
different results for a given essay depending on the order of submission to the system
and may have biased the estimates of ultimate sources of plagiarism or the applicant
characteristics associated with plagiarism. Moreover, this aspect of the software limited
our ability to infer the ultimate causes of plagiarism, sources of plagiarized content, or
strategies that might be employed to prevent future occurrences.
Second, the system detects content matches, not plagiarism per se. Any
determination of the latter is necessarily subjective, as there is no gold standard for
plagiarism. Moreover, as Wilhoit has observed, “plagiarism covers a multitude of errors,
ranging from sloppy documentation and proofreading to outright, premeditated fraud”
(25). In addition, students may genuinely be unaware of the generally accepted
standards for originality in their writing and their attitudes may differ strikingly from those
of faculty members (26). This may be particularly relevant for international applicants,
some of whom hail from cultures which do not view copying the work of others in the
same negative light as do Western academics (27, 28). Still another possibility is
15
cryptomnesia (“hidden memory”), in which an applicant is not aware that he or she is
copying remembered content from another source (27). Though our 10% content
matching threshold has been used by others (5) and did not lead to any subjective
judgments of false positives among ourselves, we cannot make any inferences
regarding the intent of the authors.
Third, many of the univariate predictors of plagiarism we identified likely interact
with each other. For example, publication history, older age, and other language
fluency correlated with previous residency training (data not shown). We reported each
characteristic as a univariate predictor of plagiarism, as our intent was to estimate the
prevalence of plagiarism, not to derive a rigorous predictive model for it. Only a
multivariate analysis in a new dataset would be able to find such a model.
How should residency selection committees respond to our findings? By
predetermined study design, the authors agreed to maintain the anonymity of the
shielded essays that were submitted and not unmask the identity of the applicants, so
we did not determine if any applicants with plagiarized essays were invited to interview
or matched to our programs. However, we believe that action should be taken to
discourage and detect plagiarism in the future. Ideally, this would take place at the level
of the ERAS. Besides making the process unavoidable by applicants, this would
eliminate the difficulties inherent in multiple programs using the software
simultaneously, because essays submitted to the software become part of the database
for future submissions. Furthermore, manual inspection of the similarity report itself,
rather than simply reporting the score, would allow individual program directors to make
independent judgments regarding the seriousness of any putative offense. Finally,
there is reason to believe that the mere knowledge that essays are being screened by
16
plagiarism detection software would have a significant deterrent effect on would-be
plagiarizers (29, 30).
In summary, we have detected evidence of plagiarism among applicants to
residency programs in five specialties at a major academic training hospital. Although
more common in international applicants, the offense was found in all specialties, from
applicants of all medical school types, and even among those with significant academic
honors. We believe a concerted, nationwide effort to detect and deter such plagiarism
is warranted.
17
Figure legend
Figure: Excerpts of representative similarity reports from personal statements.
Submitted essays are on the left and the source matched to are on the right. Matching
text is highlighted in red. A. An essay scoring 10% (the threshold used to define
plagiarism). B. An essay scoring 34%. C. An essay scoring 71%.
18
Article and Author Information
Funding/support: Software was purchased with departmental funds (Anesthesiology,
Perioperative and Pain Medicine).
Financial disclosures: None of the authors has any relevant financial disclosures.
Author Contributions:
Study concept and design: Segal, Gelfand, Katz
Acquisition of data: Segal, Gelfand, Katz, Nadel, Berkowitz, Ashley
Analysis and interpretation of data: Segal, Hurwitz, Gelfand
Drafting of the manuscript: Segal, Gelfand
Critical revision of the manuscript for important intellectual content: Segal, Gelfand,
Katz, Nadel, Berkowitz, Ashley, Hurwitz
Statistical analysis: Segal, Hurwitz
Study supervision: Segal, Gelfand, Katz
19
References
1. Association of American Medical Colleges. The ERAS Integrity Promotion
Education Program. [cited 2009 February 3]; Available from:
http://www.aamc.org/students/erasfellow/policies/integrityeducation.htm.
2. Association of American Medical Colleges. MyERAS User Guide. 2009
[updated 2009; cited 2009 February 3]; Available from:
http://www.aamc.org/students/eras/resources/downloads/2009myerasresidency9
08.pdf.
3. Introna L, Hayes N. Plagiarism detection systems and international
students: detecting plagiarism, copying or learning? In: Roberts TS, editor.
Student plagiarism in an online world : problems and solutions. Hershey, PA:
Information Science Reference; 2008. p. 108-22.
4. Schleimer S, Wilkerson DS, Aiken A. Winnowing: local algorithms for
document fingerprinting. Proceedings of the 2003 ACM SIGMOD international
conference on Management of data; 2003; San Diego, CA. ACM; 2003. p. 76-85.
5. UCAS Similarity Detection Service - guidance for applicants. [cited 2009
March 30]; Available from:
http://www.ucas.com/students/startapplication/apply09/personalstatement/similari
tydetection.
6. Oxford English Dictionary Online. Oxford University Press; 2009 [updated
2009; cited 2009 June 3]; Available from: http://dictionary.oed.com.
20
7. Baker DR, Jackson VP. Misrepresentation of publications by radiology
residency applicants. Acad Radiol. 2000 Sep;7(9):727-9.
8. Cohen-Gadol AA, Koch CA, Raffel C, Spinner RJ. Confirmation of
research publications reported by neurological surgery residency applicants.
Surg Neurol. 2003 Oct;60(4):280-3; discussion 3-4.
9. Dale JA, Schmitt CM, Crosby LA. Misrepresentation of research criteria by
orthopaedic residency applicants. J Bone Joint Surg Am. 1999 Dec;81(12):1679-
81.
10. Hebert RS, Smith CG, Wright SM. Minimal prevalence of authorship
misrepresentation among internal medicine residency applicants: do previous
estimates of "misrepresentation" represent insufficient case finding? Ann Intern
Med. 2003 Mar 4;138(5):390-2.
11. Konstantakos EK, Laughlin RT, Markert RJ, Crosby LA. Follow-up on
misrepresentation of research activity by orthopaedic residency applicants: has
anything changed? J Bone Joint Surg Am. 2007 Sep;89(9):2084-8.
12. Patel MV, Pradhan BB, Meals RA. Misrepresentation of research
publications among orthopedic surgery fellowship applicants: a comparison with
documented misrepresentations in other fields. Spine. 2003 Apr 1;28(7):632-6;
discussion 1.
13. Yang GY, Schoenwetter MF, Wagner TD, Donohue KA, Kuettel MR.
Misrepresentation of publications among radiation oncology residency applicants.
J Am Coll Radiol. 2006 Apr;3(4):259-64.
21
14. Cole AF. Plagiarism in graduate medical education. Fam Med. 2007
Jun;39(6):436-8.
15. National Resident Matching Program. Results and Data: 2008 Main
Residency Match. 2008 [updated 2008; cited 2009 March 1]; Available from:
http://www.nrmp.org/data/resultsanddata2008.pdf.
16. Accreditation Council for Graduate Medical Education. General
competencies. [cited 2009 January 26]; Available from:
http://www.acgme.org/outcome/comp/compMin.asp.
17. Taylor CA, Weinstein L, Mayhew HE. The process of resident selection: a
view from the residency director's desk. Obstet Gynecol. 1995 Feb;85(2):299-
303.
18. Papadakis MA, Hodgson CS, Teherani A, Kohatsu ND. Unprofessional
behavior in medical school is associated with subsequent disciplinary action by a
state medical board. Acad Med. 2004 Mar;79(3):244-9.
19. Papadakis MA, Arnold GK, Blank LL, Holmboe ES, Lipner RS.
Performance during internal medicine residency training and subsequent
disciplinary action by state licensing boards. Ann Intern Med. 2008 Jun
3;148(11):869-76.
20. Scanlon PM. Student online plagiarism: How do we respond? College
Teaching. 2003;51(4):161-5.
21. Scanlon PM, Neumann DR. Internet plagiarism among college students.
Journal of College Student Development. 2002;43(3):374-85.
22
22. Rimer S. A Campus Fad That's Being Copied: Internet Plagiarism Seems
on the Rise. New York Times. September 3, 2003;Sect. B7.
23. Royce J. Trust or Trussed? Has Turnitin.com Got It All Wrapped Up?
Teacher Librarian. 2003;30(4):26-30.
24. Errami M, Hicks JM, Fisher W, Trusty D, Wren JD, Long TC, Garner HR.
Deja vu--a study of duplicate citations in Medline. Bioinformatics. 2008 Jan
15;24(2):243-9.
25. Wilhoit S. Helping students avoid plagiarism. College Teaching.
1994;42(4):161-4.
26. DeVoss D, Rosati AC. “It wasn’t me, was it?” Plagiarism and the Web.
Computers and Composition. 2002;19:191-203.
27. Park C. In Other (People’s) Words: plagiarism by university students—
literature and lessons. Assessment & Evaluation in Higher Education.
2003;28(5):471-88.
28. Hayes N, Introna L. Systems for the Production of Plagiarists? The
Implications Arising from the Use of Plagiarism Detection Systems in UK
Universities for Asian Learners. Journal of Academic Ethics. 2005;3:55-73.
29. Bilic-Zulle L, Azman J, Frkovic V, Petrovecki M. Is there an effective
approach to deterring students from plagiarizing? Sci Eng Ethics. 2008
Mar;14(1):139-47.
30. Braumoeller BF, Gaines BJ. Actions Do Speak Louder than Words:
Deterring Plagiarism with the Use of Plagiarism-Detection Software. PS: Political
Science & Politics. 2001;34(04):835-9.
23
24
Table 1: Characteristics of applicants, by specialty
Anesthesiology Emergency Medicine
Internal Medicine
Obstetrics/ Gynecology
Surgery All P value (difference
across specialties)
Total applicants
692 582 2463 570 703 5010
Citizenship* U.S. Citizen Foreign national Permanent resident Other
548 (79.2)92 (13.3)
49 (7.1)3 (0.4)
521 (89.5)44 (7.6)16 (2.7)
1 (0.2)
1654 (67.2)
631 (25.6)160 (6.5)
18 (0.7)
399 (70.0)117 (20.5)
51 (9.0)3 (0.5)
459 (65.3)154 (21.9)
82 (11.7)8 (1.1)
3581 (71.5)1038 (20.7)
358 (7.1)33 (0.7)
<.0001
Medical school type† U.S. Private U.S. Public International Fifth pathway and other
237 (34.3)216 (31.2)187 (27.0)
52 (7.5)
249 (42.8)222 (38.1)
91 (15.6)20 (3.4)
964 (39.1)622 (25.3)833 (33.8)
44 (1.8)
150 (26.3)211 (37.0)188 (33.0)
21 (3.7)
208 (29.6)177 (25.2)305 (43.4)
13 (1.9)
1808 (36.1)1448 (28.9)1604 (32.0)
150 (3.0)
<.0001
Medical school location U.S. or Canada Foreign
504 (72.8)188 (27.2)
491 (84.4)91 (15.6)
1627 (66.1)836 (33.9)
381 (66.8)189 (33.2)
395 (56.2)308 (43.8)
3398 (67.8)1612 (32.2)
<.0001
ECFMG status (non-U.S. medical school applicants)
Certified Not certified
151 (80.3)37 (19.7)
59 (64.8)32 (35.2)
721 (86.2)115 (13.8)
162 (85.7)27 (14.3)
260 (84.4)48 (15.6)
1353 (83.9)259 (16.1)
<.0001
Gender Female Male
250 (36.2) 441 (63.8)
268 (46.1) 314 (53.9)
1121 (45.5)1341 (54.5)
420 (73.7)150 (26.3)
233 (33.2)468 (66.8)
2292 (45.8)2714 (54.2)
<.0001
Age 27.7 (25.9-30.5)
27.1(25.8-29.7)
27.0(25.7-29.4)
27.4(25.8-30.8)
27.5(26.0-30.9)
27.2(25.8-29.9)
<.0001
Member of AOA
40 (5.8) 64 (11.0) 319 (13.0) 55 (9.7) 52 (7.4) 530 (10.6) <.0001
USMLE step 1 score
219 (205-232) 224 (209-235) 232 (216-246) 222 (206-232) 225 (210-237) 225 (211-240) <.0001
Language fluency other than English‡
434 (63.2) 352 (60.9) 1779 (73.2) 407 (72.0) 508 (73.0) 3480 (70.2) <.0001
25
Anesthesiology Emergency Medicine
Internal Medicine
Obstetrics/ Gynecology
Surgery All P value (difference
across specialties)
Previous residency
177 (25.6) 74 (12.7) 357 (14.5) 105 (18.4) 202 (28.8) 915 (18.3) <.0001
Publications
380 (54,9) 330 (56.7) 1630 (66.2) 332 (58.3) 467 (66.4) 3139 (62.7) <.0001
Research experience
547 (79.0) 470 (80.8) 1998 (81.1) 441 (77.4) 558 (79.4) 4014 (80.1) .27
Volunteer experience
587 (84.8) 533 (91.6) 2125 (86.3) 492 (86.3) 583 (82.9) 4320 (86.3) .0002
Values shown as N (%) or Median (Q1-Q3) P value for difference across specialties by chi-square for categorical variables or Kruskal-Wallis for continuous variables. ECFMG = Educational Commission for Foreign Medical Graduates, AOA=Alpha Omega Alpha medical honor society; USMLE=United States Medical Licensing Examination * “Other” includes conditional permanent resident (N=24) and refugee/asylum/displaced person (N=9). † “Other” includes Canadian medical school and U.S. osteopathic medical school students or graduates
‡ Includes any language fluency self-reported by candidates
26
Table 2: Prevalence of plagiarism in entire cohort and by selected characteristics of the applicants Characteristic Plagiarism (Similarity Score ≥ 10) Unadjusted
P value P value adjusted for
U.S. Citizenship n/N % (95% CI) Entire cohort 259/4975 5.2 (4.5%, 5.9%) -- -- Specialty*
Anesthesiology Emergency medicine Internal medicine Obstetrics and gynecology Surgery
30/675 10/578 141/2456 44/569 34/697
4.4 (2.9, 6.0) 1.7 (.7, 2.8) 5.7 (4,8, 6.7) 7.7 (5.5, 9.9) 4.9 (3.3, 6.5)
.34 .0002 .09 .0043 .67
.85 .052 .94 .0073 .20
Citizenship
U.S. Citizen Permanent resident Foreign national Other
65/3561 48/353 143/1028 3/33
1.8 (1.4, 2.3)
13.6 (10.0, 17.2) 13.9 (11.8, 16.0) 9.1 (0, 18.9)
Reference <.0001 <.0001 .0065
--
Medical school type†
U.S. Private U.S. Public International Fifth pathway and other
22/1799 15/1443 218/1589 4/144
1.2 (.72, 1.7) 1.0 (.52, 1.6)
13.7 (12.0, 15.4) 2.8 (.09, 5.5)
Reference .63 <.0001 .13
.67 <.0001 .86
Medical school location U.S. or Canada Foreign
41/3378 218/1597
1.2 (.84, 1.6)
13.7 (12.0, 15.3)
<.0001
<.0001
ECFMG status (non-U.S. medical school applicants)
Certified Not certified
195/1340 23/257
14.6 (12.8, 16.5) 8.9 (6.0, 13.1)
.017
.22
Gender Female Male
116/2278 142/2693
5.1 (4.2, 6.0) 5.3 (4.4, 6.1)
.78
.17
Age at application --/4971 -- <.0001 .49 Alpha Omega Alpha
Member Non-member
5/526 254/4449
0.95 (.12, 1.8) 5.7 (5.0, 6.4)
<.0001
.025
USMLE Step 1 score --/2165 -- <.0001 .0002
27
28
Plagiarism (Similarity Score ≥ 10) Unadjusted P value
P value adjusted for U.S. Citizenship
n/N % (95% CI) Language fluency other than English‡
Yes No
232/3451 25/1469
6.7 (5.9, 7.6) 1.7 (1.0, 2.4)
<.0001
.06
Previous residency or fellowship Yes No
112/900 147/4075
12.4 (10.3, 14.6) 3.6 (3.0, 4.2)
<.0001
.0087
Research experience Yes No
160/3985 99/990
4.0 (3.4, 4.6)
10.0 (8.1, 11.9)
<.0001
.0022
Publications Yes No
142/3122 117/1853
4.6 (3,8, 5.3) 6.3 (5.2, 7.4)
.0069
.024
Volunteer experience§ Yes No
181/4290 78/685
4.2 (3.6, 4.8)
11.4 (9.0, 13.8)
<.0001
.0081
ECFMG = Educational Commission for Foreign Medical Graduates, USMLE = United States Medical Licensing Examination *P value for each specialty relative to all other specialties. † “Other” includes Canadian medical school and U.S. osteopathic medical school students or graduates
‡ Includes any language fluency self-reported by candidates § Interaction between US citizenship and Volunteer experience (p=0.004): OR=0.29 (95% CI 0.16, 0.54; P-.0001) for US citizens; OR=0.81 (95% CI 0.58, 1.12; P=.20) for non-US citizens. No other statistical interactions were found among the applicant characteristics.
Characteristic
Table 3: Distribution of similarity scores Similarity score N Proportion 0-9 4716 94.8%
0 3737 75.1% 1 160 3.2% 2 331 6.7% 3 166 3.3% 4 98 2.0% 5 70 1.4% 6 54 1.1% 7 40 0.8% 8 29 0.6% 9 31 0.6%
10-19 134 2.7% 20-29 52 1.1% 30-39 27 0.54% 40-49 13 0.26% 50-59 8 0.16% 60-69 7 0.14% 70-79 6 0.12% 80-89 4 0.08% 90-99 6 0.12% 100 2 0.04% Total 4975 100%
29
Alternative Table 3: Distribution of similarity scores Similarity score N Proportion 0-9 4716 94.8%
0 3737 75.1% 1 160 3.2% 2 331 6.7% 3 166 3.3% 4 98 2.0% 5 70 1.4% 6 54 1.1% 7 40 0.8% 8 29 0.6% 9 31 0.6%
10-19 134 2.7% 10 24 0.48% 11 17 0.34% 12 15 0.30% 13 13 0.26% 14 15 0.30% 15 13 0.26% 16 13 0.26% 17 8 0.16% 18 8 0.16% 19 8 0.16%
20-29 52 1.1% 30-39 27 0.54% 40-49 13 0.26% 50-59 8 0.16% 60-69 7 0.14% 70-79 6 0.12% 80-89 4 0.08% 90-99 6 0.12% 100 2 0.04% Total 4975 100%
30
31
Table 4: Sources of material matching submitted essays with similarity score ≥10 (N=259). Source n % (95% Confidence interval)Other essay in cohort 166 64.1 (58.1, 69.7)
One essay in cohort 92 35.5 (29.9, 41.5) Multiple essays in cohort 74 28.6 (23.4, 34.4)
Essay on educational domain (.edu) website* 56 21.6 (17.0, 27.0) One essay on .edu sites 53 20.5 (16.0, 25.8) Multiple essays on .edu sites 3 1.1 (0.39, 3.3)
Other educational domain web content* 8 3.1 (1.6, 6.0) Essay on non-educational domain (.com, .info, .org, .net) website*
114 44.0 (38.1, 50.1)
One essay on non-.edu site 92 35.5 (29.9, 41.5) Multiple essays on non-.edu sites 22 8.5 (5.7, 12.5)
Other non-educational domain web content* 35 13.5 (9.9, 18.2) One other non-.edu website 24 9.3 (6.3, 13.4) Multiple other non-.edu websites 11 4.2 (2.4, 7.4)
Other essays in database 9 3.5 (1.8, 6.5) Print material 2 0.7 (0.2, 2.8) *Numerous websites were sources of matching content (n=105). The five most commonly matching were www.aippg.info (n=47), webcampus.med.drexel.edu (n=32), home.att.net/~ppmd/cv-ps/cv-ps.htm (n=17), www.medfools.com (n=14), and www.users.qwest.net (n=14). Sixteen other sites matched to more than one essay in the cohort.
Figure 1. Excerpts of representative similarity reports.
A.
32
B.
33
C.
34