Should peer-review be double-blind?

As part of the recent discussion on anonymous peer review, several people spoke out in favor of double-blind peer review, where neither the authors nor the reviewers know who the others are. I have thought a lot about double-blind peer review, and I’m not entirely convinced, in particular when it comes to grant applications. While double-blind review might solve certain problems and remove certain biases, it would almost certainly amplify other issues, and whether the net effect would be good or bad is unclear. It would also give more power to people such as editors and program managers who operate outside the blinded process.

So let’s discuss. I’ll first cover journal articles and then grant proposals. The two are very different, and what applies to one does not necessarily apply to the other.

Journal articles

On the face of it, double-blind peer review for journal articles seems like a no-brainer. It allows junior scientists to be judged on the merit of their work alone, and it prevents senior scientists from coasting through peer review on the basis of their good name. However, as always, the devil is in the detail. There are at least three reasons I can think of why double-blind review may not be such a great idea after all.

First, the real power is with the editors, not the reviewers. It’s the editors who make the final decisions. And this power tends to increase with the perceived rank of the journal; editors of more prestigious journals are more likely to reject papers without review or because of a perceived lack of interest. We all know that you won’t publish in Nature if the Nature editors don’t like your work. And we also know that you can survive quite harsh reviewer criticisms if the Nature editors really want to publish your work [1]. Thus, one could make an argument that author names should be blinded to both reviewers and editors. But how would the editors invite unbiased reviewers if they don’t know who has a potential conflict of interest? The only practical way would be to have one editor invite the reviewers and another make the decision. This would be an interesting experiment, but I doubt any journal will go for it any time soon. Thus, as long as author names remain known to the handling editor, I doubt that double-blind review will make much of a difference.

Second, blinding the authors of an article is incredibly hard. Just removing the author names from the first page of a paper will frequently not do. If the authors work on a unique study system, or make extensive use of their prior work, or have developed a specific software package, or have posted their data on a public repository, reviewers will likely be able to find out who they are. I once reviewed a paper where the authors had not only removed their names from the title page but also from citations to their own work in the reference list. I’m still amazed that somebody would (i) go to these lengths to conceal their identity and (ii) not realize that this action completely obliterated any anonymity they might otherwise have had. Double-blinding may create an illusion of anonymity where none actually exists.

Third, double-blind review gives more power to scientists who are intent on submitting fraudulent work, because it is going to be harder for reviewers to identify patterns in the perpetrators’ activities. For example, I have my list of crooks with a solid portfolio on Retraction Watch, and when I get to review one of their papers I’ll be extra careful. These are usually scientists whose activities slipped by me the first time I reviewed one of their papers, because everything looked fine on the surface. If I regularly had to review their papers in a double-blind fashion, they would probably manage to slip by me more frequently.

One benefit of double-blind peer review, however, could be that even if the reviewers have a sense of which lab(s) may have been involved in a given study, they still won’t be able to guess the exact author list. For example, I don’t know to what extent reviewers are biased by the gender of the first author if a paper comes from an established lab, but if they are, that bias would likely disappear in double-blind review. Reviewers may guess correctly that the paper comes out of my lab, but they won’t know which of my students wrote it. Evidence in favor of this notion comes from one experiment in which double-blind peer review increased the number of female first authors. Similarly, when junior PIs continue working on research they begun in an established person’s lab, reviewers won’t be able to tell whether the paper comes from the established lab or the junior PI. This could work to the advantage of junior PIs.

In summary, the positive and the negative aspects of double-blind review are about even, in my opinion. I have no major concerns about double-blind review, as long as I as an author am not expected to do anything more than remove my name from the author list. I’m not interested in going through my entire paper and making sure not a single sentence (e.g. “We have previously investigated…”) could give a hint at who I am. Also, we now frequently make all our data and code available in a github repository, and I’m not going to go through extra effort to conceal who I am there. Other than that, I’d be happy to support more experiments in double-blind peer review, and I’ll also be happy to support double-blind peer review more strongly if evidence in its favor continues to accumulate.

Grant proposals

Grant proposals are an entirely different beast than journal articles, and I do not think that double-blind proposal review is a good idea. There is a fundamental difference between a journal article and a proposal. An article is the finished product. A grant proposal, by contrast, is only the promise of a future product. If one scientist writes ten times more high-profile papers than another, then she should publish ten times more frequently in high-profile journals, without question. However, just because one scientist is ten times better at writing grant proposals than another doesn’t mean he deserves ten times the funds. In fact, only if that scientist can write ten times as many papers or write papers that are ten times as important (however measured) would he deserve ten times the grant funding.

I strongly believe that the track record of past performance needs to be considered in proposal review. If you had to hand one person a check over a million dollars, would you rather give the money to somebody who consistently delivers interesting results, even if that person’s grant application doesn’t sound overly exciting, or would you prefer to give the money to somebody who can tell a great story but about whom you know nothing beyond that story. This thought is related to the idea (which is slowly sinking in with the NIH as well) that it is generally better to fund people than projects, or at least to have a healthy mix of people-based and project-based funding. By definition, if you’re evaluating people, you cannot blind the evaluators to their identity.

Now you could argue that the scientific review should be done blinded and the final funding decision be made by the program officer, who can take into account all the other relevant factors, such as track record, current funding of the applicant, etc. However, this would simply put more power into the hands of program officers, who might or might not use that power wisely. It’s certainly not unheard of for program officers in some agencies to preferentially fund their good buddies. The more power a program officer has to override a panel decision the more likely those situations are going to arise.

Double-blind grant review is also open to several sorts of manipulation by applicants. First, it would be easier than it already is to base an application on dubious, sketchy, or even entirely made-up data, because nobody would ever know [2]. Second, applicants could pack their proposals with prior results obtained by the biggest shot in the field, causing the reviewers to think they’re reviewing an application by that lab and rank it higher because of that. You might say that that’s exactly the point, only ideas matter, but I’ve seen too many scientists with great ideas and poor execution to feel comfortable with funding decisions based exclusively on ideas [3].

In conclusion, I don’t think that double-blind grant applications are the way to go. There are other ways to minimize biases in the review process. For example, panels could be given statistics on how many women and junior scientists submitted applications to a given competition, and if the composition of the top-ranked applicants deviates substantially from the overall composition of applicants then the panel could be asked to reconsider their rankings. In general, just paying attention to these kinds of biases and monitoring whether certain groups of applicants are disproportionally affected by either positive or negative decisions should prevent the most egregious biases.

Update (10/19/2014): The article I quoted claiming an increased number of female first authors under double-blind review has later been called into question, as comparable journals have similarly seen an increase in the number of female first authors during the same time period, without instituting double-blind review. Thanks to Matt Hodgkinson for pointing this out.

Update #2 (10/19/2014): This paper, pointed out to me by Matt Hodgkinson, provides a thorough review of what is currently known about biases in peer review. It shows mixed evidence on gender bias. In particular with respect to journal articles, current evidence suggests bias isn’t that pronounced (female and male authors have comparable acceptance rates).

Notes

[1] I certainly have reviewed papers for Nature that I thought should not be published there and the editors overruled me. And this is fine; editors should have the ultimate decision power. I’m an editor myself, and on occasion I accept papers that reviewers say should be rejected. The point remains, though, that an editor who really wants to publish a paper will rarely be deterred by negative reviews, in particular if the reviews don’t call out egregious errors in the work.

[2] Even under the current system of non-blinded review, grant applicants can include sketchy or made-up data with little risk to their career or reputation. While such activity is obviously fraudulent and will have severe consequences if discovered, the likelihood of discovery is low. First, only three to five other scientists ever see the application, and only for a short period of time. So if an applicant, for example, reuses the same data set in subsequent applications but labels the resulting figure differently, it’s very unlikely anybody would notice. Similarly, if an applicant claims preliminary data support one hypothesis and later publishes a paper supporting a different hypothesis, he could always argue that that is indeed how discovery went: first things looked one way but after more careful study it became clear the other way was right. Only the most blatantly obvious fraud, such as publishing fraudulent data in a paper and then using that paper as preliminary results in an application, has any likelihood of being discovered. For these reasons, a colleague of mine here at UT thinks that results that aren’t published or maybe at least deposited on a public archive should not be allowed in grant applications at all.

[3] As with everything in life, I think some balance is required here. Grant applicants should have to demonstrate some amount of prior expertise in the work they propose, but they should also be given the benefit of the doubt that some things can be worked out as the research is done.