Are preprints reliable?

To quote from a paper published yesterday in PLOS Biology:

Does the information shared in preprints typically withstand the scrutiny of peer review, or are conclusions likely to change in the version of record? We assessed preprints from bioRxiv and medRxiv that had been posted and subsequently published in a journal through April 30, 2020, representing the initial phase of the pandemic response. We utilised a combination of automatic and manual annotations to quantify how an article changed between the preprinted and published version. We found that the total number of figure panels and tables changed little between preprint and published articles. Moreover, the conclusions of 7.2% of non-COVID-19-related and 17.2% of COVID-19-related abstracts undergo a discrete change by the time of publication, but the majority of these changes do not qualitatively change the conclusions of the paper.

Later: “A major concern with expedited publishing is that it may impede the rigour of the peer review process.”

So far, according to this and one other paper published by PLOS Biology, it seems reasonable to ask not whether preprints are reliable but what peer-review brings to the table. (By this I mean the conventional/legacy variety of closed pre-publication review).

To the uninitiated: paralleling the growing popularity and usefulness of open-access publishing, particularly in the first year of the COVID-19 pandemic, some “selective” journals – to use wording from the PLOS Biology paper – and their hordes of scientist-supporters have sought to stress the importance of peer-review in language both familiar and based on an increasingly outdated outlook: that peer-review is important to prevent misinformation. I’ve found a subset of this argument, that peer-review is important for papers whose findings could save/end lives, to be more reasonable, and the rest just unreasonable and self-serving.

Funnily enough, two famously “selective” journals, The Lancet and the New England Journal of Medicineretracted two papers related to COVID-19 care in the thick of the pandemic – invalidating their broader argument in favour of peer-review as well as the efficiency of their own peer-review processes vis-à-vis the subset argument.

Arguments in favour of peer-review are self-serving because it has more efficient, more transparent and more workable alternatives, yet many journals have failed to adopt them, and have instead used this repeatedly invalidated mode of reviewing papers to maintain their opaque style of functioning, which in turn – and together with the purported cost of printing papers on physical paper – they use to justify the exorbitant prices they charge readers (here’s one ludicrous example).

For example, one alternative is pre-publication peer-review, in which scientists upload their paper to a preprint server, like arXiv, bioRxiv or medRxiv, and share the link with their peers and, say, on social media platforms. There, independent experts review the paper’s contents and share their comments. The paper’s authors can incorporate the necessary changes, with credit, as separate versions of the same paper on the server.

Further, and unlike ‘conventional’ journals’ laughable expectation of journalists to write about the papers they publish without fear of being wrong, journalists subject preprint papers to the same treatment that is due the average peer-reviewed paper as well: with reasonable and courteous scepticism, and to qualify its claims and findings with comments from independent experts – with an added caveat, though I personally think it unnecessary, that their subject is a preprint paper.

(Some of you might remember that in 2018, Tom Sheldon argued in a Nature News & Views article that peer-review facilitates good journalism. I haven’t come across an argument more objectionable in favour of conventional peer-review.)

However, making this mode of reviewing and publishing more acceptable has been very hard, especially for the demand to repeatedly push back against scientists whose academic reputation depends on having published and being able to publish in “selective” journals and the scientometric culture they uphold, and their hollow arguments about the virtues of conventional, opaque peer-review. (Making peer-review transparent could also help deal with reviewers who use the opportunity anonymity affords them to be sexist and racist.)

But with the two new PLOS Biology papers, we have an opportunity to flip these scientists’ and journals’ demand that preprint papers ‘prove’ or ‘improve’ themselves around to ask what the legacy modes bring to the table. From the abstract of the second paper (emphasis added):

We sought to compare the and contrast linguistic features within bioRxiv preprints to published biomedical test as a while as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish.

It seems to me to be reasonable to ask about the rigour to which supporters of conventional peer-review have staked claim when few papers appear to benefit from it. The process may be justified in those few cases where a paper is corrected in a significant way, and that it may be difficult to identify those papers without peer-review – but pre-publication peer-review has an equal chance of identifying the same errors (esp. if we increase the discoverability of preprints the way journal editors identify eminent experts in the same field to review papers, instead of relying solely on social-media interactions that less internet-savvy scientists may not be able to initiate).

In addition, it appears that in most cases in which preprints were uploaded to bioRxiv first and were then peer-reviewed and published by a journal, the papers’ authors clearly didn’t submit papers that required significant quality improvements – certainly not to the extent to which conventional peer-review’s supporters have alluded to in an effort to make such review necessary.

So, why must conventional peer-review, in the broader sense, persist?