Another controversy, another round of blaming preprints

On February 1, Anand Ranganathan, the molecular biologist more popular as a columnist for Swarajya, amplified a new preprint paper from scientists at IIT Delhi that (purportedly) claims the Wuhan coronavirus’s (2019 nCoV’s) DNA appears to contain some genes also found in the human immunodeficiency virus but not in any other coronaviruses. Ranganathan also chose to magnify the preprint paper’s claim that the sequences’ presence was “non-fortuitous”.

To be fair, the IIT Delhi group did not properly qualify what they meant by the use of this term, but this wouldn’t exculpate Ranganathan and others who followed him: to first amplify with alarmist language a claim that did not deserve such treatment, and then, once he discovered his mistake, to wonder out loud about whether such “non-peer reviewed studies” about “fast-moving, in-public-eye domains” should be published before scientific journals have subjected them to peer-review.

The more conservative scientist is likely to find ample room here to revive the claim that preprint papers only promote shoddy journalism, and that preprint papers that are part of the biomedical literature should be abolished entirely. This is bullshit.

The ‘print’ in ‘preprint’ refers to the act of a traditional journal printing a paper for publication after peer-review. A paper is designated ‘preprint’ if it hasn’t undergone peer-review yet, even though it may or may not have been submitted to a scientific journal for consideration. To quote from an article championing the use of preprints during a medical emergency, by three of the six cofounders of medRxiv, the preprints repository for the biomedical literature:

The advantages of preprints are that scientists can post them rapidly and receive feedback from their peers quickly, sometimes almost instantaneously. They also keep other scientists informed about what their colleagues are doing and build on that work. Preprints are archived in a way that they can be referenced and will always be available online. As the science evolves, newer versions of the paper can be posted, with older historical versions remaining available, including any associated comments made on them.

In this regard, Ranganathan’s ringing the alarm bells (with language like “oh my god”) the first time he tweeted the link to the preprint paper without sufficiently evaluating the attendant science was his decision, and not prompted by the paper’s status as a preprint. Second, the bioRxiv preprint repository where the IIT Delhi document showed up has a comments section, and it was brimming with discussion within minutes of the paper being uploaded. More broadly, preprint repositories are equipped to accommodate peer-review. So if anyone had looked in the comments section before tweeting, they wouldn’t have had reason to jump the gun.

Third, and most important: peer-review is not fool-proof. Instead, it is a legacy method employed by scientific journals to filter legitimate from illegitimate research and, more recently, higher quality from lower quality research (using ‘quality’ from the journals’ oft-twisted points of view, not as an objective standard of any kind).

This framing supports three important takeaways from this little scandal.

A. Much like preprint repositories, peer-reviewed journals also regularly publish rubbish. (Axiomatically, just as conventional journals also regularly publish the outcomes of good science, so do preprint repositories; in the case of 2019 nCoV alone, bioRxiv, medRxiv and SSRN together published at least 30 legitimate and noteworthy research articles.) It is just that conventional scientific journals conduct the peer-review before publication and preprint repositories (and research-discussion platforms like PubPeer), after. And, in fact, conducting the review after allows it to be continuous process able to respond to new information, and not a one-time event that culminates with the act of printing the paper.

But notably, preprint repositories can recreate journals’ ability to closely control the review process and ensure only experts’ comments are in the fray by enrolling a team of voluntary curators. The arXiv preprint server has been successfully using a similar team to carefully eliminate manuscripts advancing pseudoscientific claims. So as such, it is easier to make sure people are familiar with the preprint and post-publication review paradigm than to take advantage of their confusion and call for preprint papers to be eliminated altogether.

B. Those who support the idea that preprint papers are dangerous, and argue that peer-review is a better way to protect against unsupported claims, are by proxy advocating for the persistence of a knowledge hegemony. Peer-review is opaque, sustained by unpaid and overworked labour, and dispenses the same function that an open discussion often does at larger scale and with greater transparency. Indeed, the transparency represents the most important difference: since peer-review has traditionally been the demesne of journals, supporting peer-review is tantamount to designating journals as the sole and unquestionable arbiters of what knowledge enters the public domain and what doesn’t.

(Here’s one example of how such gatekeeping can have tragic consequences for society.)

C. Given these safeguards and perspectives, and as I have written before, bad journalists and bad comments will be bad irrespective of the window through which an idea has presented itself in the public domain. There is a way to cover different types of stories, and the decision to abdicate one’s responsibility to think carefully about the implications of what one is writing can never have a causal relationship with the subject matter. The Times of India and the Daily Mail will continue to publicise every new paper discussing whatever coffee, chocolate and/or wine does to the heart, and The Hindu and The Wire Science will publicise research published in preprint papers because we know how to be careful and of the risks to protect ourselves against.

By extension, ‘reputable’ scientific journals that use pre-publication peer-review will continue to publish many papers that will someday be retracted.

An ongoing scandal concerning spider biologist Jonathan Pruitt offers a useful parable – that journals don’t always publish bad science due to wilful negligence or poor peer-review alone but that such failures still do well to highlight the shortcomings of the latter. A string of papers the work on which Pruitt led were found to contain implausible data in support of some significant conclusions. Dan Bolnick, the editor of The American Naturalist, which became the first journal to retract Pruitt’s papers that it had published, wrote on his blog on January 30:

I want to emphasise that regardless of the root cause of the data problems (error or intent), these people are victims who have been harmed by trusting data that they themselves did not generate. Having spent days sifting through these data files I can also attest to the fact that the suspect patterns are often non-obvious, so we should not be blaming these victims for failing to see something that requires significant effort to uncover by examining the data in ways that are not standard for any of this. … The associate editor [who Bolnick tasked with checking more of Pruitt’s papers] went as far back as digging into some of Pruitt’s PhD work, when he was a student with Susan Riechert at the University of Tennessee Knoxville. Similar problems were identified in those data… Seeking an explanation, I [emailed and then called] his PhD mentor, Susan Riechert, to discuss the biology of the spiders, his data collection habits, and his integrity. She was shocked, and disturbed, and surprised. That someone who knew him so well for many years could be unaware of this problem (and its extent), highlights for me how reasonable it is that the rest of us could be caught unaware.

Why should we expect peer-review – or any kind of review, for that matter – to be better? The only thing we can do is be honest, transparent and reflexive.