Text analysis software—developed by researchers from the Virginia Bioinformatics Institute (VBI) in Blacksburg, VA, and Collin College in Texas—has revealed and categorized plagiarism in the MEDLINE biomedical literature database. The developers used their text similarity–based information retrieval search engine eTBLAST to retrieve full text articles in MEDLINE and put them in the Déjà vu database for further analysis.
“We were surprised by the magnitude of the issue; 1–2% of papers seem to be questionably too similar,” said Harold “Skip” Garner, executive director of VBI and coauthor of a recent paper published in the Public Library of Congress One. “Now, 1–2% doesn’t sound bad, except for when you multiply it by 20,000, which is the size of MEDLINE.”
Garner has been programming software to streamline biomedical literature searches since the 1990s and has previously published on the use of eTBLAST and Déjà vu for plagiarism detection in abstracts. But now his research team has analyzed full-text articles to classify patterns of plagiarism.
Garner’s team found that questionable papers could be determined by analyzing their abstracts. If an abstract analysis surpasses a certain similarity threshold, chances are good that the paper contains some plagiarism as well. While this abstract similarly threshold test could provide editors and publishers with an efficient, automated plagiarism test, Garner does not believe it is a replacement for manual review, which often illuminates mitigating factors, such as the paper’s position in a multi-step research protocol or its designation as a review. The VBI team is improving the eTBLAST and Déjà vu software to account for these patterns while still holding scientists to the same standard as other writers.
“If you look at any society group, there’s going to be a few percent of people who are not going to adhere to the norms of that group, who will cross the line,” said Garner. “We have to admit that there is going to be a tendency to be more competitive, to enhance one’s resume, to be able to win a grant in an environment where only 10% of grants are awarded, and there are considerable incentives for people to cross that line.”
Garner hopes that the study—which provides similarity analysis among different sections of articles, among papers with or without shared authors, and among subject reviews—leads to a tightening of publication standards and reinforces ethical submission practices. According to Garner, such a stringent submission review and analysis will lead to a grander overhaul in the scientific community and a renewed energy and efficiency.
“It is very important that we have the highest quality collection of biological and medical literature as possible, so we must make certain that inappropriate papers are not in there, and one of the first things to do is find them,” says Garner.
The paper, “Systematic characterizations of text similarity in full text biomedical publications,” was published 15 Sept. 2010 in PLoS One.