study: plagiarism in medical residency applications

Plagiarism in residency application essays

Scott Segal, MD, MHCM

Brian J. Gelfand, MD

Shelley Hurwitz, PhD

Lori Berkowitz, MD

Stanley W. Ashley, MD

Eric S. Nadel, MD

Joel T. Katz, MD

Dr. Segal and Dr. Gelfand contributed equally to the study.

From Brigham and Women's Hospital, Harvard Medical School, Boston, MA,

Departments of Anesthesiology, Perioperative and Pain Medicine (SS, BJG), Internal

Medicine (JTK, SH), Emergency Medicine (ESN; also Massachusetts General Hospital

Department of Emergency Medicine), Obstetrics and Gynecology and Reproductive

Endocrinology (LB; also Massachusetts General Hospital Department of Obstetrics,

Gynecology and Reproductive Biology), and Surgery (SWA). Address correspondence

and reprint requests to Dr. Segal at Department of Anesthesiology, Perioperative and

Pain Medicine, Brigham and Women's Hospital, 75 Francis St., Boston, MA, 02115 or at

[email protected].

Word count: 3122 (text only)

Abstract word count: 270

1

mailto:[email protected]

Abstract

Background: Anecdotal reports suggest some residency application essays contain

plagiarized content.

Objective: To determine the prevalence of plagiarism in a large cohort of application

essays.

Design: Retrospective cohort study of residency application essays.

Setting: Single large academic medical center.

Participants: Essays submitted as part of applications to five large residency programs

at the same academic medical center between 9/1/2005 and 3/22/2007 in

anesthesiology, emergency medicine, internal medicine, obstetrics and gynecology, and

surgery (N=4975).

Measurements: Essays were compared to a database of Internet pages, published

works, and previously submitted essays by specialized software that returns a score of

0-100, representing the percentage of the submission matching another source. An

essay matching more than 10% to existing work was defined as evidence of plagiarism.

Results: Plagiarism was found in 5.2% (259/4975; 95% confidence interval [CI], 4.6%,

5.9%) of essays. Non-U.S. citizens’ essays were more likely to demonstrate evidence

of plagiarism; prevalence among U.S citizens was 1.8% (1.4%, 2.3%), among foreign

nationals 13.9% (odds ratio [OR] 8.7 [95% CI, 6.4, 11.8], P<.0001), among permanent

residents 13.6% (OR 8.5 [5.7, 12.5], P<.0001), and among others 9.1% (OR 5.4 [1.6,

18.1], P=.0065). After adjustment for citizenship, other characteristics of applicants

increasing the prevalence of plagiarism included medical school location outside the

U.S. and Canada, previous residency or fellowship, lack of research experience,

2

volunteer experience, or publications, low USMLE step 1 score, and nonmembership in

Alpha Omega Alpha honorary society.

Limitations: The software database is likely incomplete, the 10% match threshold for

defining plagiarism has not been statistically validated, and the study was confined to

applicants to one institution.

Conclusions: Although more common in international applicants, plagiarism in

residency application essays was found in all specialties, from applicants of all medical

school types, and even among those with significant academic honors.

3

All applicants to United States residency programs must complete an original

essay known as the “Personal Statement.” The format is free-form and the content is

not specified. Expectations may vary by specialty; common themes include description

of motivation for seeking training in a chosen specialty, clarification of factors impacting

suitability for a field or program, relaying a critical incident that impacted career choice

and circumstances that distinguish the applicant from others. In recent years,

numerous Web sites have offered advice and sample essays to assist applicants in this

effort. The National Resident Matching Program’s (NRMP) Electronic Residency

Application Service (ERAS) lists several such sites but cautions applicants to submit

only original, personal work (1). ERAS also warns applicants that “any substantiated

findings of plagiarism may result in reporting of such findings to the programs to which

[they] apply now and in the future (2). Applicants must certify that work is accurate and

original before an ERAS application is complete.”

Although anecdotal reports of personal statement plagiarism have appeared in

online discussion boards, no systematic investigation of the incidence of plagiarism has

been undertaken. The development of software that can efficiently search large

databases of previously created content has dramatically improved the ability to detect

plagiarized essays in other contexts. Using such software adapted for application

essays, we undertook an analysis of personal statements submitted with residency

applications in five specialties. The software returns a “similarity score” representing

the percentage of the submitted essay matching content in the database. The primary

goal of this investigation was to use this methodology to estimate the prevalence of

plagiarism in applicants’ personal statements at our institution, and determine its

4

association with demographic, educational, and experience-related applicant

characteristics.

5

METHODS

Participants

Personal statements in applications to residency programs at Brigham and

Women's Hospital in Boston, MA in five specialties, with planned starts in July 2007,

were analyzed for plagiarism. Applications were submitted between 9/1/2005 and

3/22/2007. Specialties studied were internal medicine, anesthesiology, general surgery,

obstetrics and gynecology, and emergency medicine, which are the five largest

residency programs at the institution. Because applicants had previously submitted the

material to the residency programs after attesting to its originality and veracity, and

because anonymity was assured by the study design, the institutional review board

approved the protocol and waived the requirement for informed consent from the

applicants.

Measurements and data collection

Essays were exported from the respective programs’ ERAS databases and de-

identified by removing the applicant name and application number in exchange for a

computer-generated random number. A single copy of the master key linking code

numbers to the applicants’ Association of American Medical Colleges (AAMC) number

was retained. Other applicant data (gender, age, medical school type and location,

volunteer, research, or work experience, publications, Alpha Omega Alpha membership,

United States Medical Licensing Examination [USMLE] scores, citizenship, self-reported

“language fluency other than English”, previous residency or fellowship, certification

status by the Educational Commission for Foreign Medical Graduates [ECFMG]) was

exported from the respective ERAS databases, matched with the computer-generated

code numbers using AAMC number and specialty of application as keys, and de-

6

identified. Only data exportable as categorical information from ERAS was used. Free

text fields, medical school and undergraduate college names, specific volunteer,

research, or work activities, individual language skills, honors other than AOA, hobbies

and interests, and information about interviews and ranking for the NRMP were not

analyzed.

Essays were analyzed for possible plagiarism by specialized software (Turnitin

for Admissions Essays, iParadigms, LLC, Oakland, CA). The program uses the

technique of digital fingerprinting to represent groups of characters in the source

document as numeric sequences which are then sampled in groups of approximately 40

characters. The fingerprint of the submitted essay is then compared to a database

including web pages, printed resources, and previously submitted essays (3, 4). The

algorithm used efficiently searches for exact or close matches within 40 character

strings but also rejects very common phrases (3). The system also ignores material in

the submitted essay enclosed in quotation marks. The software returns a “similarity

score” between 0 and 100, representing the percentage of the submitted essay that

matches to a source in the database. A report showing the original source material

matching the submitted essay and highlighting the matching passages is also returned.

The Turnitin system will match submitted essays to others previously submitted

(which are permanently incorporated into its database) and will show such matches

preferentially over matches to an outside source. It is not possible to identify the essay

matched to other than by the institution submitting it, and it was not possible to

determine if the matching material also matched an Internet or other outside source.

Therefore, the order of submission can affect the similarity scores returned, so the initial

similarity scores were adjusted to resolve matches to other essays within the submitted

7

cohort. For essays matching solely to one other essay also submitted by the

investigators, we determined the author of the essays by comparing the coded identities

to the applicant’s unique applicant number within ERAS but maintained the de-

identification in the submitted essays. If an applicant applied to more than one specialty

and matched his or her own essay but neither essay matched any external source, the

score for both essays was set to 0. If an applicant matched his or her own essay and

the essay matched to also matched an outside source, then the scores of both essays

were set to the lesser of the two scores originally reported for the applicant’s essays. If

an essay matched another in the submitted cohort from a different applicant, then

scores were kept as originally reported, although it was not possible to determine which

essay copied which or whether both copied an unidentified third source.

The similarity score distribution was highly skewed; 75% of scores were zero.

Among non-zero scores, the median and mean were 3.0 and 8.2. Therefore, the

similarity score was dichotomized (plagiarism or not plagiarism) for analysis. Two

independent investigators (B.G. and S.S.) examined the unadjusted similarity scores

and reports for 100 randomly selected essays in which the score was >0 to determine

approximate cutoff values to use to dichotomize the scores into likely and unlikely

plagiarism. Scores from 0-4 typically reflected matches to proper names of individuals

or institutions, or relatively common short phrases, and therefore were generally thought

to represent minimal chance of true plagiarism. Scores between 4 and 10 represented

a mixture of similar innocent matches and others that were judged likely to be

plagiarism because of the original source material to which the essay matched.

Subsequent inspection of all essays with scores ≥10 (N=259, including 41 from the

screening sample) by two observers (B.G. and S.S.) revealed no cases deemed to be

8

false positives by either observer. Plagiarism was therefore defined as a similarity

score ≥10 and all analyses were carried out using this cutoff value, consistent with other

application essay plagiarism detection systems using different software (5).

Statistical analysis

Baseline characteristics of the applicant pools across specialties were compared

with chi square test for categorical variables or Kruskal-Wallis test for age, essay length

and USMLE step 1 score. The proportion of similarity scores greater than or equal to

10 was calculated with 95% confidence intervals for the entire cohort and for each

applicant characteristic, and the ability of various individual variables available from the

ERAS databases to predict plagiarism was analyzed by logistic regression. Potential

collinearity was checked with diagnostics and examination of the variances of the

estimates. Multiple variable logistic regression was used to adjust for the influence of

US citizenship (yes/no) or US or Canadian medical school status (yes/no) in evaluating

the ability of each single variable to predict plagiarism. The results using US citizenship

and the results using US or Canadian medical school were very similar; the results

using US citizenship are reported here. Where the interaction between the variable and

US citizenship was significant, we report the subgroup results. All statistical tests were

two-sided, nominal P-values are reported, and SAS version 9.1 and JMP version 7.0

(both from SAS Institute, Cary, NC) were used.

Role of the Funding Source

No external funding source supported this study.

9

RESULTS

Applicant Characteristics

5010 applicant files were analyzed, and 4975 essays were screened for evidence

of plagiarism. Thirty-five files either had no essay or the essay could not be extracted

for technical reasons. The characteristics of the applicants within each specialty and for

the entire cohort are shown in Table 1. There were significant differences across

specialties for all abstracted features except for research experience.

Plagiarism

Plagiarism, defined as a similarity score greater than or equal to 10, was found in

5.2% (95% CI 4.6%, 5.9%) of essays overall (Table 2). The Figure shows excerpts

from two representative examples of plagiarized essays with scores of 10% and 71%.

The distribution of similarity scores for the entire cohort is shown in Table 3. Essay

length was weakly inversely associated with similarity score (adjusted r2=.007,

P=.0016).

Essays from international applicants, whether defined by citizenship, medical

school type, or medical school location, were significantly more likely to demonstrate

plagiarism (Table 2). Essays from applicants with membership in Alpha Omega Alpha

(AOA; an honorary society recognizing students in the top 10-20% of their medical

school class), research experience, volunteer experience, and higher USMLE Step 1

scores were less likely to demonstrate plagiarism (Table 2). Essays from older

applicants, those indicating language fluency other than English, and those with

previous residency training showed a significantly higher prevalence of plagiarism

(Table 2). Controlling for international applicant status eliminated the effect of age and

other language fluency and reduced the effects of previous residency training, research

10

experience, volunteer experience, and AOA membership (Table 2). Among applicants

from non-U.S. or Canadian medical schools, certification by the Educational

Commission for Foreign Medical Graduates (ECFMG) was associated with a

significantly higher prevalence of plagiarism (Table 2). Comparison of prevalence of

plagiarism across specialties showed only Obstetrics and Gynecology to differ from the

others (Table 2, AOR 1.6 [95% CI 1.1, 2.3], P=.0073).

Sources of material to which essays with similarity scores ≥10 matched are

shown in Table 4.

11

DISCUSSION

Plagiarism may be defined as “the action or practice of taking someone else's

work, idea, etc., and passing it off as one's own; literary theft” (6). The recognition of

falsified material on applications for postgraduate medical training has been previously

documented. Investigators have reported a 10-30% prevalence of misleading

publication citations, including reference to non-existent articles, journals, or authorship

(7-13). One small study of 26 applications to a single family practice fellowship found

three cases of plagiarism in the application essays (14). The present study is the first

report of widespread plagiarism in personal statements in residency applications.

The results of the present investigation suggest plagiarism in approximately one

in twenty applications. Plagiarism was seen in essays from applicants to all specialties,

from domestic and foreign medical schools, in both genders, and among applicants with

strong academic credentials. Although confined to a single institution, our sample

represents 28-45% of the total number of applicants nationwide in the studied

specialties (15). It is unclear whether applicants to the five competitive programs in our

institution are wholly representative of the overall applicant pool, but the large fraction

applying makes systematic bias unlikely.

Our finding of significant plagiarism in residency application essays is worrisome.

First, it is likely that residency selection committees would find misrepresentation on the

application to be a strongly negative indicator of future performance as a resident. The

ACGME has deemed professionalism to be one of the six core competencies to be

taught and assessed in undergraduate and graduate medical education (16). The

authors believe that program directors would find a breach of professionalism in the

application to be an unacceptable baseline from which to begin residency. Second,

12

lapses in professionalism in medical school (18) and residency training (19) can be

predictive of future disciplinary action by state medical boards. Third, increasing public

scrutiny of physicians’ ethical behavior is likely to put pressure on training programs to

enforce strict rules of conduct, beginning with the application process.

The growth of the Internet has been implicated in the mounting evidence for

widespread plagiarism in high school, undergraduate and graduate education (20, 21).

Besides making vast amounts of material available for “cut and paste” or “cyber-“

plagiarism, there is evidence that students view material available online in a different

light than material published in print form. Copying from the Internet is viewed by a

disconcertingly large fraction of students as acceptable practice and not true plagiarism

(22). In response, within the non-scientific fields and liberal arts communities, several

web based software solutions have been developed to detect intentional

misrepresentation. The largest such service (turnitin.com) has been utilized

successfully in thousands of undergraduate and graduate level universities worldwide

(23) and was selected for the present study. Only recently has this methodology been

applied to scientific material, but these efforts have already identified significant

episodes of plagiarism and duplicate publication within the biomedical literature (24).

Our study adapted the latest iteration of this technique to target application essays.

Our study was not designed to determine the best cutoff value in the similarity

score returned by the software for differentiating true plagiarism from innocent matches

to other documents. Inspection of a number of similarity reports (which show the

matching content from the submitted essay and source material in the turnitin.com

database) demonstrated no cases judged to be innocent matches when the score was

≥10. This value was used as the dichotomous cutoff for our definition of plagiarism in

13

the present study. A similar system and 10% cutoff is used in the U.K. as part of the

Universities and Colleges Admissions Service (UCAS) (5). Table 3 illustrates the

potential effect of other cutoff values on the prevalence estimate of plagiarism.

However, only a formal sensitivity analysis of varying cutoff values could determine an

optimal value. Unfortunately, this would require an agreed-upon gold standard for

plagiarism, which does not exist.

There are, therefore, a number of factors that could make our prevalence

estimate either too high or too low. An essay quoting extensive material as a block not

enclosed by quotation marks, multiple relatively short common phrases, or a long

citation of an individual’s title or institution, might match the database but represent

appropriate use of the source material. It was not possible for us to vary the 40-

character string length used by the software for matching to source material. In our

data, however, individual review of the 259 essays with a similarity score of ≥10

revealed no such instances felt to represent false positives. An essay previously

submitted by the author in a different setting or year might also match extensively to the

probed content, also yielding a false positive result. However, the Turnitin system had

not been used by any other residency programs prior to our study (personal

communication, Jeffrey Lorton, iParadigms, LLC).

Conversely, many factors may have led us to underestimate the true prevalence

of plagiarism. First, the turnitin.com database is likely incomplete. The database was

originally conceived to screen undergraduate and arts and science graduate school

applications, not medical school or residency applications. Furthermore, applicants

plagiarizing from other applicants’ essays that are not in the public space, or previously

submitted for analysis, would be missed by the software. This could include material

14

found only in books, essays passed directly between candidates, or essays submitted

by others in prior years. Second, close paraphrasing involving careful word substitution

may represent plagiarism but nonetheless escape detection by the software. Finally,

the authors found numerous examples of unattributed use of material, which was

strongly felt to be plagiarism, but which did not reach the 10% similarity score threshold.

Therefore, we believe that our prevalence estimate is conservative, and plagiarism is

likely more common than we report.

There are other limitations of our study. First, the detection software is limited by

the incorporation of submitted material into its database and its preference for showing

matches to other essays in the database instead of external sources. This may lead to

different results for a given essay depending on the order of submission to the system

and may have biased the estimates of ultimate sources of plagiarism or the applicant

characteristics associated with plagiarism. Moreover, this aspect of the software limited

our ability to infer the ultimate causes of plagiarism, sources of plagiarized content, or

strategies that might be employed to prevent future occurrences.

Second, the system detects content matches, not plagiarism per se. Any

determination of the latter is necessarily subjective, as there is no gold standard for

plagiarism. Moreover, as Wilhoit has observed, “plagiarism covers a multitude of errors,

ranging from sloppy documentation and proofreading to outright, premeditated fraud”

(25). In addition, students may genuinely be unaware of the generally accepted

standards for originality in their writing and their attitudes may differ strikingly from those

of faculty members (26). This may be particularly relevant for international applicants,

some of whom hail from cultures which do not view copying the work of others in the

same negative light as do Western academics (27, 28). Still another possibility is

15

cryptomnesia (“hidden memory”), in which an applicant is not aware that he or she is

copying remembered content from another source (27). Though our 10% content

matching threshold has been used by others (5) and did not lead to any subjective

judgments of false positives among ourselves, we cannot make any inferences

regarding the intent of the authors.

Third, many of the univariate predictors of plagiarism we identified likely interact

with each other. For example, publication history, older age, and other language

fluency correlated with previous residency training (data not shown). We reported each

characteristic as a univariate predictor of plagiarism, as our intent was to estimate the

prevalence of plagiarism, not to derive a rigorous predictive model for it. Only a

multivariate analysis in a new dataset would be able to find such a model.

How should residency selection committees respond to our findings? By

predetermined study design, the authors agreed to maintain the anonymity of the

shielded essays that were submitted and not unmask the identity of the applicants, so

we did not determine if any applicants with plagiarized essays were invited to interview

or matched to our programs. However, we believe that action should be taken to

discourage and detect plagiarism in the future. Ideally, this would take place at the level

of the ERAS. Besides making the process unavoidable by applicants, this would

eliminate the difficulties inherent in multiple programs using the software

simultaneously, because essays submitted to the software become part of the database

for future submissions. Furthermore, manual inspection of the similarity report itself,

rather than simply reporting the score, would allow individual program directors to make

independent judgments regarding the seriousness of any putative offense. Finally,

there is reason to believe that the mere knowledge that essays are being screened by

16

plagiarism detection software would have a significant deterrent effect on would-be

plagiarizers (29, 30).

In summary, we have detected evidence of plagiarism among applicants to

residency programs in five specialties at a major academic training hospital. Although

more common in international applicants, the offense was found in all specialties, from

applicants of all medical school types, and even among those with significant academic

honors. We believe a concerted, nationwide effort to detect and deter such plagiarism

is warranted.

17

Figure legend

Figure: Excerpts of representative similarity reports from personal statements.

Submitted essays are on the left and the source matched to are on the right. Matching

text is highlighted in red. A. An essay scoring 10% (the threshold used to define

plagiarism). B. An essay scoring 34%. C. An essay scoring 71%.

18

Article and Author Information

Funding/support: Software was purchased with departmental funds (Anesthesiology,

Perioperative and Pain Medicine).

Financial disclosures: None of the authors has any relevant financial disclosures.

Author Contributions:

Study concept and design: Segal, Gelfand, Katz

Acquisition of data: Segal, Gelfand, Katz, Nadel, Berkowitz, Ashley

Analysis and interpretation of data: Segal, Hurwitz, Gelfand

Drafting of the manuscript: Segal, Gelfand

Critical revision of the manuscript for important intellectual content: Segal, Gelfand,

Katz, Nadel, Berkowitz, Ashley, Hurwitz

Statistical analysis: Segal, Hurwitz

Study supervision: Segal, Gelfand, Katz

19

References

1. Association of American Medical Colleges. The ERAS Integrity Promotion

Education Program. [cited 2009 February 3]; Available from:

http://www.aamc.org/students/erasfellow/policies/integrityeducation.htm.

2. Association of American Medical Colleges. MyERAS User Guide. 2009

[updated 2009; cited 2009 February 3]; Available from:

http://www.aamc.org/students/eras/resources/downloads/2009myerasresidency9

08.pdf.

3. Introna L, Hayes N. Plagiarism detection systems and international

students: detecting plagiarism, copying or learning? In: Roberts TS, editor.

Student plagiarism in an online world : problems and solutions. Hershey, PA:

Information Science Reference; 2008. p. 108-22.

4. Schleimer S, Wilkerson DS, Aiken A. Winnowing: local algorithms for

document fingerprinting. Proceedings of the 2003 ACM SIGMOD international

conference on Management of data; 2003; San Diego, CA. ACM; 2003. p. 76-85.

5. UCAS Similarity Detection Service - guidance for applicants. [cited 2009

March 30]; Available from:

http://www.ucas.com/students/startapplication/apply09/personalstatement/similari

tydetection.

6. Oxford English Dictionary Online. Oxford University Press; 2009 [updated

2009; cited 2009 June 3]; Available from: http://dictionary.oed.com.

20

7. Baker DR, Jackson VP. Misrepresentation of publications by radiology

residency applicants. Acad Radiol. 2000 Sep;7(9):727-9.

8. Cohen-Gadol AA, Koch CA, Raffel C, Spinner RJ. Confirmation of

research publications reported by neurological surgery residency applicants.

Surg Neurol. 2003 Oct;60(4):280-3; discussion 3-4.

9. Dale JA, Schmitt CM, Crosby LA. Misrepresentation of research criteria by

orthopaedic residency applicants. J Bone Joint Surg Am. 1999 Dec;81(12):1679-

81.

10. Hebert RS, Smith CG, Wright SM. Minimal prevalence of authorship

misrepresentation among internal medicine residency applicants: do previous

estimates of "misrepresentation" represent insufficient case finding? Ann Intern

Med. 2003 Mar 4;138(5):390-2.

11. Konstantakos EK, Laughlin RT, Markert RJ, Crosby LA. Follow-up on

misrepresentation of research activity by orthopaedic residency applicants: has

anything changed? J Bone Joint Surg Am. 2007 Sep;89(9):2084-8.

12. Patel MV, Pradhan BB, Meals RA. Misrepresentation of research

publications among orthopedic surgery fellowship applicants: a comparison with

documented misrepresentations in other fields. Spine. 2003 Apr 1;28(7):632-6;

discussion 1.

13. Yang GY, Schoenwetter MF, Wagner TD, Donohue KA, Kuettel MR.

Misrepresentation of publications among radiation oncology residency applicants.

J Am Coll Radiol. 2006 Apr;3(4):259-64.

21

14. Cole AF. Plagiarism in graduate medical education. Fam Med. 2007

Jun;39(6):436-8.

15. National Resident Matching Program. Results and Data: 2008 Main

Residency Match. 2008 [updated 2008; cited 2009 March 1]; Available from:

http://www.nrmp.org/data/resultsanddata2008.pdf.

16. Accreditation Council for Graduate Medical Education. General

competencies. [cited 2009 January 26]; Available from:

http://www.acgme.org/outcome/comp/compMin.asp.

17. Taylor CA, Weinstein L, Mayhew HE. The process of resident selection: a

view from the residency director's desk. Obstet Gynecol. 1995 Feb;85(2):299-

303.

18. Papadakis MA, Hodgson CS, Teherani A, Kohatsu ND. Unprofessional

behavior in medical school is associated with subsequent disciplinary action by a

state medical board. Acad Med. 2004 Mar;79(3):244-9.

19. Papadakis MA, Arnold GK, Blank LL, Holmboe ES, Lipner RS.

Performance during internal medicine residency training and subsequent

disciplinary action by state licensing boards. Ann Intern Med. 2008 Jun

3;148(11):869-76.

20. Scanlon PM. Student online plagiarism: How do we respond? College

Teaching. 2003;51(4):161-5.

21. Scanlon PM, Neumann DR. Internet plagiarism among college students.

Journal of College Student Development. 2002;43(3):374-85.

22

22. Rimer S. A Campus Fad That's Being Copied: Internet Plagiarism Seems

on the Rise. New York Times. September 3, 2003;Sect. B7.

23. Royce J. Trust or Trussed? Has Turnitin.com Got It All Wrapped Up?

Teacher Librarian. 2003;30(4):26-30.

24. Errami M, Hicks JM, Fisher W, Trusty D, Wren JD, Long TC, Garner HR.

Deja vu--a study of duplicate citations in Medline. Bioinformatics. 2008 Jan

15;24(2):243-9.

25. Wilhoit S. Helping students avoid plagiarism. College Teaching.

1994;42(4):161-4.

26. DeVoss D, Rosati AC. “It wasn’t me, was it?” Plagiarism and the Web.

Computers and Composition. 2002;19:191-203.

27. Park C. In Other (People’s) Words: plagiarism by university students—

literature and lessons. Assessment & Evaluation in Higher Education.

2003;28(5):471-88.

28. Hayes N, Introna L. Systems for the Production of Plagiarists? The

Implications Arising from the Use of Plagiarism Detection Systems in UK

Universities for Asian Learners. Journal of Academic Ethics. 2005;3:55-73.

29. Bilic-Zulle L, Azman J, Frkovic V, Petrovecki M. Is there an effective

approach to deterring students from plagiarizing? Sci Eng Ethics. 2008

Mar;14(1):139-47.

30. Braumoeller BF, Gaines BJ. Actions Do Speak Louder than Words:

Deterring Plagiarism with the Use of Plagiarism-Detection Software. PS: Political

Science & Politics. 2001;34(04):835-9.

23

Table 1: Characteristics of applicants, by specialty

Anesthesiology Emergency Medicine

Internal Medicine

Obstetrics/ Gynecology

Surgery All P value (difference

across specialties)

Total applicants

692 582 2463 570 703 5010

Citizenship* U.S. Citizen Foreign national Permanent resident Other

548 (79.2)92 (13.3)

49 (7.1)3 (0.4)

521 (89.5)44 (7.6)16 (2.7)

1 (0.2)

1654 (67.2)

631 (25.6)160 (6.5)

18 (0.7)

399 (70.0)117 (20.5)

51 (9.0)3 (0.5)

459 (65.3)154 (21.9)

82 (11.7)8 (1.1)

3581 (71.5)1038 (20.7)

358 (7.1)33 (0.7)

<.0001

Medical school type† U.S. Private U.S. Public International Fifth pathway and other

237 (34.3)216 (31.2)187 (27.0)

52 (7.5)

249 (42.8)222 (38.1)

91 (15.6)20 (3.4)

964 (39.1)622 (25.3)833 (33.8)

44 (1.8)

150 (26.3)211 (37.0)188 (33.0)

21 (3.7)

208 (29.6)177 (25.2)305 (43.4)

13 (1.9)

1808 (36.1)1448 (28.9)1604 (32.0)

150 (3.0)

<.0001

Medical school location U.S. or Canada Foreign

504 (72.8)188 (27.2)

491 (84.4)91 (15.6)

1627 (66.1)836 (33.9)

381 (66.8)189 (33.2)

395 (56.2)308 (43.8)

3398 (67.8)1612 (32.2)

<.0001

ECFMG status (non-U.S. medical school applicants)

Certified Not certified

151 (80.3)37 (19.7)

59 (64.8)32 (35.2)

721 (86.2)115 (13.8)

162 (85.7)27 (14.3)

260 (84.4)48 (15.6)

1353 (83.9)259 (16.1)

<.0001

Gender Female Male

250 (36.2) 441 (63.8)

268 (46.1) 314 (53.9)

1121 (45.5)1341 (54.5)

420 (73.7)150 (26.3)

233 (33.2)468 (66.8)

2292 (45.8)2714 (54.2)

<.0001

Age 27.7 (25.9-30.5)

27.1(25.8-29.7)

27.0(25.7-29.4)

27.4(25.8-30.8)

27.5(26.0-30.9)

27.2(25.8-29.9)

<.0001

Member of AOA

40 (5.8) 64 (11.0) 319 (13.0) 55 (9.7) 52 (7.4) 530 (10.6) <.0001

USMLE step 1 score

219 (205-232) 224 (209-235) 232 (216-246) 222 (206-232) 225 (210-237) 225 (211-240) <.0001

Language fluency other than English‡

434 (63.2) 352 (60.9) 1779 (73.2) 407 (72.0) 508 (73.0) 3480 (70.2) <.0001

25

Anesthesiology Emergency Medicine

Internal Medicine

Obstetrics/ Gynecology

Surgery All P value (difference

across specialties)

Previous residency

177 (25.6) 74 (12.7) 357 (14.5) 105 (18.4) 202 (28.8) 915 (18.3) <.0001

Publications

380 (54,9) 330 (56.7) 1630 (66.2) 332 (58.3) 467 (66.4) 3139 (62.7) <.0001

Research experience

547 (79.0) 470 (80.8) 1998 (81.1) 441 (77.4) 558 (79.4) 4014 (80.1) .27

Volunteer experience

587 (84.8) 533 (91.6) 2125 (86.3) 492 (86.3) 583 (82.9) 4320 (86.3) .0002

Values shown as N (%) or Median (Q1-Q3) P value for difference across specialties by chi-square for categorical variables or Kruskal-Wallis for continuous variables. ECFMG = Educational Commission for Foreign Medical Graduates, AOA=Alpha Omega Alpha medical honor society; USMLE=United States Medical Licensing Examination * “Other” includes conditional permanent resident (N=24) and refugee/asylum/displaced person (N=9). † “Other” includes Canadian medical school and U.S. osteopathic medical school students or graduates

‡ Includes any language fluency self-reported by candidates

26

Table 2: Prevalence of plagiarism in entire cohort and by selected characteristics of the applicants Characteristic Plagiarism (Similarity Score ≥ 10) Unadjusted

P value P value adjusted for

U.S. Citizenship n/N % (95% CI) Entire cohort 259/4975 5.2 (4.5%, 5.9%) -- -- Specialty*

Anesthesiology Emergency medicine Internal medicine Obstetrics and gynecology Surgery

30/675 10/578 141/2456 44/569 34/697

4.4 (2.9, 6.0) 1.7 (.7, 2.8) 5.7 (4,8, 6.7) 7.7 (5.5, 9.9) 4.9 (3.3, 6.5)

.34 .0002 .09 .0043 .67

.85 .052 .94 .0073 .20

Citizenship

U.S. Citizen Permanent resident Foreign national Other

65/3561 48/353 143/1028 3/33

1.8 (1.4, 2.3)

13.6 (10.0, 17.2) 13.9 (11.8, 16.0) 9.1 (0, 18.9)

Reference <.0001 <.0001 .0065

--

Medical school type†

U.S. Private U.S. Public International Fifth pathway and other

22/1799 15/1443 218/1589 4/144

1.2 (.72, 1.7) 1.0 (.52, 1.6)

13.7 (12.0, 15.4) 2.8 (.09, 5.5)

Reference .63 <.0001 .13

.67 <.0001 .86

Medical school location U.S. or Canada Foreign

41/3378 218/1597

1.2 (.84, 1.6)

13.7 (12.0, 15.3)

<.0001

<.0001

ECFMG status (non-U.S. medical school applicants)

Certified Not certified

195/1340 23/257

14.6 (12.8, 16.5) 8.9 (6.0, 13.1)

.017

.22

Gender Female Male

116/2278 142/2693

5.1 (4.2, 6.0) 5.3 (4.4, 6.1)

.78

.17

Age at application --/4971 -- <.0001 .49 Alpha Omega Alpha

Member Non-member

5/526 254/4449

0.95 (.12, 1.8) 5.7 (5.0, 6.4)

<.0001

.025

USMLE Step 1 score --/2165 -- <.0001 .0002

27

28

Plagiarism (Similarity Score ≥ 10) Unadjusted P value

P value adjusted for U.S. Citizenship

n/N % (95% CI) Language fluency other than English‡

Yes No

232/3451 25/1469

6.7 (5.9, 7.6) 1.7 (1.0, 2.4)

<.0001

.06

Previous residency or fellowship Yes No

112/900 147/4075

12.4 (10.3, 14.6) 3.6 (3.0, 4.2)

<.0001

.0087

Research experience Yes No

160/3985 99/990

4.0 (3.4, 4.6)

10.0 (8.1, 11.9)

<.0001

.0022

Publications Yes No

142/3122 117/1853

4.6 (3,8, 5.3) 6.3 (5.2, 7.4)

.0069

.024

Volunteer experience§ Yes No

181/4290 78/685

4.2 (3.6, 4.8)

11.4 (9.0, 13.8)

<.0001

.0081

ECFMG = Educational Commission for Foreign Medical Graduates, USMLE = United States Medical Licensing Examination *P value for each specialty relative to all other specialties. † “Other” includes Canadian medical school and U.S. osteopathic medical school students or graduates

‡ Includes any language fluency self-reported by candidates § Interaction between US citizenship and Volunteer experience (p=0.004): OR=0.29 (95% CI 0.16, 0.54; P-.0001) for US citizens; OR=0.81 (95% CI 0.58, 1.12; P=.20) for non-US citizens. No other statistical interactions were found among the applicant characteristics.

Characteristic

Table 3: Distribution of similarity scores Similarity score N Proportion 0-9 4716 94.8%

0 3737 75.1% 1 160 3.2% 2 331 6.7% 3 166 3.3% 4 98 2.0% 5 70 1.4% 6 54 1.1% 7 40 0.8% 8 29 0.6% 9 31 0.6%

10-19 134 2.7% 20-29 52 1.1% 30-39 27 0.54% 40-49 13 0.26% 50-59 8 0.16% 60-69 7 0.14% 70-79 6 0.12% 80-89 4 0.08% 90-99 6 0.12% 100 2 0.04% Total 4975 100%

29

Alternative Table 3: Distribution of similarity scores Similarity score N Proportion 0-9 4716 94.8%

0 3737 75.1% 1 160 3.2% 2 331 6.7% 3 166 3.3% 4 98 2.0% 5 70 1.4% 6 54 1.1% 7 40 0.8% 8 29 0.6% 9 31 0.6%

10-19 134 2.7% 10 24 0.48% 11 17 0.34% 12 15 0.30% 13 13 0.26% 14 15 0.30% 15 13 0.26% 16 13 0.26% 17 8 0.16% 18 8 0.16% 19 8 0.16%

20-29 52 1.1% 30-39 27 0.54% 40-49 13 0.26% 50-59 8 0.16% 60-69 7 0.14% 70-79 6 0.12% 80-89 4 0.08% 90-99 6 0.12% 100 2 0.04% Total 4975 100%

30

31

Table 4: Sources of material matching submitted essays with similarity score ≥10 (N=259). Source n % (95% Confidence interval)Other essay in cohort 166 64.1 (58.1, 69.7)

One essay in cohort 92 35.5 (29.9, 41.5) Multiple essays in cohort 74 28.6 (23.4, 34.4)

Essay on educational domain (.edu) website* 56 21.6 (17.0, 27.0) One essay on .edu sites 53 20.5 (16.0, 25.8) Multiple essays on .edu sites 3 1.1 (0.39, 3.3)

Other educational domain web content* 8 3.1 (1.6, 6.0) Essay on non-educational domain (.com, .info, .org, .net) website*

114 44.0 (38.1, 50.1)

One essay on non-.edu site 92 35.5 (29.9, 41.5) Multiple essays on non-.edu sites 22 8.5 (5.7, 12.5)

Other non-educational domain web content* 35 13.5 (9.9, 18.2) One other non-.edu website 24 9.3 (6.3, 13.4) Multiple other non-.edu websites 11 4.2 (2.4, 7.4)

Other essays in database 9 3.5 (1.8, 6.5) Print material 2 0.7 (0.2, 2.8) *Numerous websites were sources of matching content (n=105). The five most commonly matching were www.aippg.info (n=47), webcampus.med.drexel.edu (n=32), home.att.net/~ppmd/cv-ps/cv-ps.htm (n=17), www.medfools.com (n=14), and www.users.qwest.net (n=14). Sixteen other sites matched to more than one essay in the cohort.

Figure 1. Excerpts of representative similarity reports.

A.

32

B.

33

C.

34

study: plagiarism in medical residency applications

Documents