A while back I posted the following on twitter: “I hate writing methods sections for work that is the same as previous work: tweaking wording that works to avoid self-plagiarism is tedious.” My twitter auto posts to Facebook, so my friends there saw it too. Interestingly, on FB I mostly got commiseration from others who similarly dislike having to do this (but who mostly also seem to do it). On twitter, however, the responses were mostly advice to ‘cite and copy’, and there were several people referring to COPE guidelines as justification.
The issues surrounding self-plagiarism or text recycling as this practice is sometimes called are too complex, I think, for a series of response tweets. So I decided to write a blog post about it, introducing the issue from my perspective. I also invited some friends and fellow researchers who commented on FB to contribute their perspective. They will be added later and the post updated. (Here’s a link to one of them.)
From me:
I do behavioural work. Many of my studies are trying to understand aspects of language learning. How do we learn languages? What is easy or hard to learn? For whom? And why? Studying real people learning real languages can lead to hypotheses, but it is hard to definitely answer these questions using real life learners learning real life languages in real life. (Yes, I know that that is terribly repetitive. But it gets my point across.) So researchers in my field have to do something different to really get at the questions we are interested in. The paper in question discusses a study using a miniature artificial language methodology. Let me give you some background on this methodology. Hint: when I say methods I don’t mean (just) statistical methods or analyses, I mean the whole design of the study from start to finish. In my field, a lot of the ‘heavy-lifting’ is done in the design, meaning, the stimuli and test items. The nature of the data collection process is crucial and can be quite complex.
Although the general method has a pretty standard abbreviation (MAL, which I will use from hereon in), there is nothing whatsoever that is standard about MAL methods. Each MAL is constructed to get a specific question. Basically the process of Mal development goes something like this: the researcher thinks about the specific variables they are interested in isolating in a language/learning situation/learner, and then designs a language or set of languages (each given to a different condition) that varies on that single variable. Michael Erard (he writes a lot of great stuff about language) did a piece on MALs a few years back that explains the process and intent behind it well. https://motherboard.vice.com/en_us/article/sillyspeak-the-art-of-making-a-fake-language
In any case, each language is unique and the specifics of the language need to be described in enough detail so a reader can evaluate whether it actually gets at the question it was supposedly designed to get at. In my work, I use a variety of different kinds of MALs to get at different kinds of questions. Sometimes the ‘language’ is just sounds. These are used, for instance, when researchers are interested in the kinds of statistical computations learners can perform and whether those computations can help you discover the kinds of patterns that exist in real languages. This line of work got its start with Saffran, Aslin, & Newport (1997?) and their basic method has been used in a great deal of follow-up work (including some out of my lab…). People are presented with a sample of MAL input for some (usually, but not always, prespecified) amount of time and then are later tested on what they know. Testing usually involves judging items that are or are not consistent with the patterns in the input language. It might seem that this specific MAL is well known enough that methodological details beyond question or theory driven adjustments can be dealt with by simply citing the SAN paper. But, it turns out that some seemingly irrelevant methodological differences might be important to learning outcomes (plug for research by my student). Meaning that at this point we shouldn’t simply leave out methodological details from these kinds of MAL studies.
Most of my MAL work (I do other things too) investigates very different questions and uses much more complex artificial languages; the words mean something, they are presented in sentences alongside video clips, and participants are asked to produce novel sentences (i.e., sentences they didn’t get in their exposure). They are also sometimes asked to make judgments about novel sentences that are or are not consistent with the patterns in their input. The specifics of the language design are important, as are the specifics of the judgment task test items that are inconsistent with the patterns in the input. That is, the ‘ungrammatical’ MAL can sentences tell us different things depending on why or how they are ungrammatical. The specifics of the design are very important in these studies: If the language or the test items are not designed properly, the study won’t test what it is supposed to test. Thus, a thorough description of the methods is very important for readers (and reviewers!) to be able to assess the results and conclusions based on them in any MAL research, let alone replicate them.
The MALs used by SAN and related work are simple enough that it takes relatively little space to describe them well. However, the more complex languages I use in most of my work take a great deal more. Thus, the method sections in these papers are long if they (the methods) are well described. I (and others) tend to use base languages that I tweak as necessary to ask related questions. That means that there are multiple papers using very similar methods. It might seem then that I could simply refer back to the earliest paper for the basics of the methods and just explain any differences or deviations from the original in the new paper. But then the reader, or reviewer, could not actually assess the later papers on the basis of what is actually in that paper. As a reviewer, I hate it when I cannot assess a paper on the basis of what is in the paper. Don’t make me go look somewhere else to figure out whether what you did makes sense. So I am left with essentially repeating a great deal of content from one paper to the next. (Before you accuse me of salami-slicing, I don’t. These are papers asking related but different questions about a particular phenomenon and so where using very similar methods makes sense.) What to do?
Many of the tweets I received in response to my original tweet were telling me to go ahead and copy, being sure to cite the original, per COPE’s guidelines.
Let’s look at those guidelines (which the journal I am planning on submitting the paper in question to is a member of).
I downloaded a copy from the following website https://publicationethics.org/files/Web_A29298_COPE_Text_Recycling.pdf on June 13, 2017. I will inset any quotations from those guidelines to make clear which text is not mine in what follows.
These guidelines are intended to guide editors when dealing with cases of text recycling.
Text recycling, also known as self-plagiarism, occurs when sections of the same text appear (usually un-attributed) in more than one of an author’s own publications. The term ‘text recycling’ has been chosen to differentiate from ‘true’ plagiarism (i.e. when another author’s words or ideas have been used, usually without attribution).
A separate issue, not to be confused with text recycling, is redundant (duplicate) publication. Redundant (duplicate) publication generally denotes a larger problem of repeated publication of data or ideas, often with at least one author in common. This is outside the scope of these guidelines and is covered elsewhere.
Notice that is says “usually un-attributed”, suggesting that simply citing the appropriate original source does not necessarily make it not text-recycling. Moving on…
How can editors deal with text recycling?
Editors should consider each case of text recycling on an individual basis as the ‘significance’ of the overlap, and therefore the most appropriate course of action, will depend on a number of factors.
Significance isn’t defined, and the factors that are discussed don’t really make significance any clearer (to me). Shortly thereafter it says this:
In general terms, editors should consider how much text is recycled. The reuse of a few sentences is clearly different to the verbatim reuse of several paragraphs of text, although large amounts of text recycled in the methods might be more acceptable than a similar amount recycled in the discussion.
In my work, it is more than a few sentences, an even ‘several paragraphs’ is pushing it. Clearly, reuse in methods sections is seen as being different, but even there, editors are being counseled to attend to the amount of repeated text. But what exactly counts as ‘large amounts’ that ‘might be more acceptable’ – and notice that it doesn’t say ‘acceptable’, it says ‘more acceptable’. More acceptable can still be unacceptable. So far, clear as mud. The guidelines highlight the editors’ discretion, which means that they can be applied differently by different editors. And can result in serious consequences for authors.
Text recycling may be discovered in a submitted manuscript by editors or reviewers, or by the use of plagiarism detection software (e.g. CrossCheck). If overlap is considered minor, action may not be necessary or the authors may be asked to re-write overlapping sections and cite their previous article(s) if they have not done so.
More significant overlap may result in rejection of the manuscript. Where the overlap includes data, editors should handle cases according to the COPE flowchart for dealing with suspected redundant publication in a submitted manuscript. Editors should ensure that they clearly communicate the reason for rejection to the authors.
This says may be asked to rewrite and cite (if they haven’t already), again, saying that just having cited yourself is not enough, it shouldn’t be the same text (i.e., it should have been rewritten).
And from the guidelines published on the web by the journal’s publisher (Taylor & Francis) (copied text is again inset and is from the following website: http://authorservices.taylorandfrancis.com/ethics-for-authors/ (text copied below retrieved June 13, 2017):
Case 2: Plagiarism
“When somebody presents the work of others (data, words or theories) as if they were his/her own and without proper acknowledgment.” Committee of Publications Ethics (COPE)
When citing others’ (or your own) previous work, please ensure you have:
- Clearly marked quoted verbatim text from another source with quotation marks.
According to this, it might be fine if I just enclosed the pages (yes pages) in question inside quotation marks. But pages and pages of quotations (even from my own work) seems excessive.
Shortly after that section is the following one (same website, same date of retrieval, copied text is again inset to make clear it is copied and not mine):
Make sure you avoid self-plagiarism
Self-plagiarism is the redundant reuse of your own work, usually without proper citation. It creates repetition in the academic literature and can skew meta-analyses if the same sets of data are published multiple times as “new” data. If you’re discussing your own previous work, make sure you cite it.
Taylor & Francis uses CrossCheck to screen for unoriginal material. Authors submitting to a Taylor & Francis journal should be aware that their paper may be submitted to CrossCheck at any point during the peer-review or production process.
Any allegations of plagiarism or self-plagiarism made to a journal will be investigated by the editor of the journal and Taylor & Francis. If the allegations appear to be founded, all named authors of the paper will be contacted and an explanation of the overlapping material will be requested. Journal Editorial Board members may be contacted to assist in further evaluation of the paper and allegations. If the explanation is not satisfactory, the submission will be rejected, and no future submissions may be accepted (at our discretion).
Note that the first sentence says ‘usually without proper citation’ not ‘without proper citation’. That means that even including a citation does not by itself clear you of self-plagiarism. It also does not distinguish methods sections from other sections of the paper. (As a language researcher I tend to notice these wording choices as well as words that are missing. Unless I’m editing my own work, in which case I am quite likely to miss missing words, make bad wording choices, etc.)
I found a paper in Biochemia Medica discussing this issue with a bit more clarity. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900061/) The paper is attempting to make potential editorial policies regarding different kinds of self-plagiarism.
I will highlight a few sections from the paper. (Šupak-Smolčić, V., & Bilić-Zulle, L. (2013). How do we handle self-plagiarism in submitted manuscripts? Biochemia Medica, 23(2), 150–153. http://doi.org/10.11613/BM.2013.019)
In most cases of augmented manuscripts, the major overlap is seen within the methods section. As such, editors and readers can be misled to consider it as technical (self) plagiarism, which is usually not sanctioned with the same strictness as plagiarism of other parts of the paper. Nevertheless, if a submitted manuscript shows substantial overlap in the methods section with the author’s previous work, then the editor can consider this manuscript for publication only under the following circumstances:
- the author refers to his previous work,
- methods cannot be written in any other form without altering comprehensibility,
Although this section was about papers that reuse data, there is a relevant (imo) bit of text here: ‘methods cannot be written in any other form without altering comprehensibility’. This suggests that if they can be rewritten they should.
Later it seems to suggest that some overlap in methods might be OK, again at the discretion of the editor. But given the earlier passage just discussed, presumably, overlap is only deemed tolerable if unavoidable. In my paper, it is avoidable (as in, I can write it a different way, it’s just a hassle that is only being undertaken to avoid editorial hassles).
Based on the editorial policy of Biochemia Medica, upon detection of self-plagiarism, a submitted manuscript can be considered for publication only if it contains relevant new data and will contribute to overall scientific knowledge. Additional conditions have to be met:
When text similarity is observed with an author’s previous publication, and the original publication is cited, the submitted manuscript has to be revised, with the questionable parts corrected. Overlaps within the methods section can be tolerated, but the cut-off percentage is for the editor to decide. Similarities in the introduction section can be approached differently from the treatment of overlaps in the discussion and conclusion sections.
In case you think that this is silly and no one will ever face any consequences for text recycling: http://www.ithenticate.com/plagiarism-detection-blog/bid/94140/The-Challenge-of-Repeating-Methods-While-Avoiding-Plagiarism#.WUAFon0bjeQ (or search replies to my tweet to find the person whose paper got (desk?) rejected for this.
I’m not trying to pick on COPE or Taylot & Francis, I’m trying to lay out why it might not be as easy as the ‘just copy and cite’ advice I was getting. My suspicion is that that advice came from people working in very different fields with little appreciation for the nature of methods in other areas (and so why it might not be so easy for other researchers). We can have a discussion about whether these guidelines are reasonable, in fact, I think it would be good to do so. But I don’t see a way to come up with a one-size-fits all approach to this precisely because of the differences in methods. For now, I think I’ll stick with reworking my methods sections as best I can while still including all of the relevant details, because I think that methods are important for evaluation, making people go elsewhere to read them is bad, and I don’t want to get dinged by checkers for too much overlapping text. And I think that this is probably true for most people in my field. Other fields are likely quite different in terms of how much specificity is really required. Moreover, I want people to know what I actually did! Too often people think you did something you didn’t do, and then make claims about your work that are incorrect. If I provide details in the papers, they have less of an excuse for that. (Same goes for me and other people’s work – I often go back to a paper thinking they did something but finding out I was wrong. If the details aren’t there, it’s harder to do.)
(There is also the question of who actually ‘owns’ the words an author might wish to reuse. Aside from the copyright issues with many journal publications – often authors do not retain it, the journal does – if the original paper was co-authored, the words in question don’t really just ‘belong’ to a single author, and so are they really theirs to do with as they wish? I don’t know the answer to this, but it’s interesting to think about.)