A close-up photograph of Freeman Dyson.

Freeman Dyson’s PhD

The physicist, thinker and writer Freeman Dyson passed away on February 28, 2020, at the age of 96. I wrote his obituary for The Wire Science; excerpt:

The 1965 Nobel Prize for the development of [quantum electrodynamics] excluded Dyson. … If this troubled Dyson, it didn’t show; indeed, anyone who knew him wouldn’t have expected differently. Dyson’s life, work, thought and writing is a testament to a philosophy of doing science that has rapidly faded through the 20th century, although this was due to an unlikely combination of privileges. For one, in 1986, he said of PhDs, “I think it’s a thoroughly bad system, so it’s not quite accidental that I didn’t get one, but it was convenient.” But he also admitted it was easier for him to get by without a PhD.

His QED paper, together with a clutch of others in mathematical physics, gave him a free-pass to more than just dabble in a variety of other interests, not all of them related to theoretical physics and quite a few wandering into science fiction. … In 1951, he was offered a position to teach at Cornell even though he didn’t have a doctorate.

Since his passing, many people have latched on to the idea that Dyson didn’t care for awards and that “he didn’t even bother getting a PhD” as if it were a difficult but inspiring personal choice, and celebrate it. It’s certainly an unlikely position to assume and makes for the sort of historical moment that those displeased with the status quo can anchor themselves to and swing from for reform, considering the greater centrality of PhDs to the research ecosystem together with the declining quality of PhD theses produced at ‘less elite’ institutions.

This said, I’m uncomfortable with such utterances when they don’t simultaneously acknowledge the privileges that secured for Dyson his undoubtedly deserved place in history. Even a casual reading of Dyson’s circumstances suggests he didn’t have to complete his doctoral thesis (under Hans Bethe at Cornell University) because he’d been offered a teaching position on the back of his contributions to the theory of quantum electrodynamics, and was hired by the Institute for Advanced Study in Princeton a year later.

It’s important to mention – and thus remember – which privileges were at play so that a) we don’t end up unduly eulogising Dyson, or anyone else, and b) we don’t attribute Dyson’s choice to his individual personality alone instead of also admitting the circumstances Dyson was able to take for granted and which shielded him from adverse consequences. He “didn’t bother getting a PhD” because he wasn’t the worse for it; in one interview, he says he feels himself “very lucky” he “didn’t have to go through it”. On the other hand, even those who don’t care for awards today are better off with one or two because:

  • The nature of research has changed
  • Physics has become much more specialised than it was in 1948-1952
  • Degrees, grants, publications and awards have become proxies for excellence when sifting through increasingly overcrowded applicants’ pools
  • Guided by business decisions, journals definition of ‘good science’ has changed
  • Vannevar Bush’s “free play of free intellects” paradigm of administering research is much less in currency
  • Funding for science has dropped, partly because The War ended, and took a chunk of administrative freedom with it

The expectations of scientists have also changed. IIRC Dyson didn’t take on any PhD students, perhaps as a result of his dislike for the system (among other reasons because he believed it penalises students not interested in working on a single problem for many years at a time). But considering how the burdens on national education systems have shifted, his decision would be much harder to sustain today even if all of the other problems didn’t exist. Moreover, he has referred to his decision as a personal choice – that it wasn’t his “style” – so treating it as a prescription for others may mischaracterise the scope and nature of his disagreement.

However, questions about whether Dyson might have acted differently if he’d had to really fight the PhD system, which he certainly had problems with, are moot. I’m not discussing his stomach for a struggle nor am I trying to find fault with Dyson’s stance; the former is a pointless consideration and the latter would be misguided.

Instead, it seems to me to be a question of what we do know: Dyson didn’t get a PhD because he didn’t have to. His privileges were a part of his decision and cemented its consequences, and a proper telling of the account should accommodate them even if only to suggest a “Dysonian pride” in doing science requires a strong personality as well as a conspiracy of conditions lying beyond the individual’s control, and to ensure reform is directed against the right challenges.

Featured image: Freeman Dyson, October 2005. Credit: ioerror/Wikimedia Commons, CC BY-SA 2.0.

A trumpet for Ramdev

The Print published an article entitled ‘Ramdev’s Patanjali does a ‘first’, its Sanskrit paper makes it to international journal’ on February 5, 2020. Excerpt:

In a first, international science journal MDPI has published a research paper in the Sanskrit language. Yoga guru Baba Ramdev’s FMCG firm Patanjali Ayurveda had submitted the paper. Switzerland’s Basel-based MDPI … published a paper in Sanskrit for the first time. Biomolecules, one of the peer-reviewed journals under MDPI, has carried video abstracts of the paper on a medicinal herb, but with English subtitles. … The Patanjali research paper, published on 25 January in a special issue of the journal titled ‘Pharmacology of Medicinal Plants’, is on medicinal herb ‘Withania somnifera’, commonly known as ‘ashwagandha’.

This article is painfully flawed.

1. MDPI is a publisher, not a journal. It featured on Beall’s list (with the customary caveats) and has published some obviously problematic papers. I’ve heard good things about some of its titles and bad things about others. The journalist needed to have delineated this aspect instead of taking the simpler fact of publication in a journal at face value. Even then, qualifying a journal as “peer-reviewed” doesn’t cut it anymore. In a time when peer-review can be hacked (thanks to its relative opacity) and the whole publishing process subverted for profit, all journalists writing on matters of science – as opposed to just science journalists – need to perform their own checks to certify the genealogy of a published paper, especially if the name of the journal(s) and its exercise of peer-review are being employed in the narrative as markers of authority.

2. People want to publish research in English so others can discover and build on it. A paper written in Sanskrit is a gimmick. The journalist should have clarified this point instead of letting Ramdev’s minions (among the authors of the paper) claim brownie points for their feat. It’s a waste of effort, time and resources. More importantly The Print has conjured a virtue out of thin air and broadcast asinine claims like “This is the first step towards the acceptance of ‘Sanskrit language’ in the field of research among the international community.”

3. The article has zero critique of the paper’s findings, no independent comments and no information about the study’s experimental design. This is the sort of nonsense that an unquestioning commitment to objectivity in news allows: reporters can’t just write someone said something if what they said is wrong, misleading, harmful or all three. Magnifying potentially indefensible claims relating to scientific knowledge – or knowledge that desires the authority of science’s approval – without contextualising them and fact-checking them if necessary may be objective but it is also a public bad. It pays to work with the assumption (even when it doesn’t apply) that at least 50% of your readers don’t know better. That way, even if 1% (an extremely conservative estimate for audiences in India) doesn’t know better, which can easily run into the thousands, you avoid misinforming them by not communicating enough.

4. A worryingly tendentious statement appears in the middle of the piece: “The study proves that WS seeds help reduce psoriasis,” the journalist writes, without presenting any evidence that she checked. It seems possible that the journalist believes she is simply reporting the occurrence of a localised event – in the form of the context-limited proof published in a paper – without acknowledging that the act of proving a hypothesis is a process, not an event, in that it is ongoing. This character is somewhat agnostic of the certainty of the experiment’s conclusions as well: even if one scientist has established with 100% confidence that the experiment they designed has sustained their hypothesis and published their results in a legitimate preprint repository and/or a journal, other scientists will need to replicate the test and even others are likely to have questions they’ll need answered.

5. The experiment was conducted in mice, not humans. Cf. @justsaysinmice

6. “‘We will definitely monetise the findings. We will be using the findings to launch our own products under the cosmetics and medicine category,’ Acharya [the lead author] told ThePrint.” It’s worrying to discover that the authors of the paper, and Baba Ramdev, who funded them, plan to market a product based on just one study, in mice, in a possibly questionable paper, without any independent comments about the findings’ robustness or tenability, to many humans who may not know better. But the journalist hasn’t pressed Acharya or any of the other authors on questions about the experiment or their attempt to grab eyeballs by writing and speaking in Sanskrit, or on how they plan to convince the FSSAI to certify a product for humans based on a study in mice.

The cycle

Is it just me or does everyone see a self-fulfilling prophecy here?

For a long time, and assisted ably by the ‘publish or perish’ paradigm, researchers sought to have their papers published in high-impact-factor journals – a.k.a. prestige journals – like Nature.

Such journals in turn, assisted ably by parasitic strategies, made these papers highly visible to other researchers around the world and, by virtue of being high-IF journals, tainted the results in the papers with a measure of prestige, ergo importance.

Evaluations and awards committees in turn were highly aware of these papers over others and picked their authors for rewards over others, further amplifying their work, increasing the opportunity cost incurred by the researchers who lose out, and increasing the prestige attached to the high-IF journals.

Run this cycle a few million times and you end up with the impression that there’s something journals like Nature get right – when in fact it’s just mostly a bunch of business practices to ensure they remain profitable.

National Geographic magazines. Credit: Unsplash

The case for preprints

Daniel Mansur, the principal investigator of a lab at the Universidade Federal de Santa Catarina that studies how cells respond to viruses, had this to say about why preprints are useful in an interview to eLife:

Let’s say the paper that we put in a preprint is competing with someone and we actually have the same story, the same set of data. In a journal, the editors might ask both groups for exactly the same sets of extra experiments. But then, the other group that’s competing with me works at Stanford or somewhere like that. They’ll order everything they need to do the experiments, and the next day three postdocs will be working on the project. If there’s something that I don’t have in the lab, I have to wait six months before starting the extra experiments. At least with a preprint the work might not be complete, but people will know what we did.

Preprints level the playing field by eliminating one’s “ability to publish” in high-IF journals as a meaningful measure of the quality of one’s work.

While this makes it easier for scientists to compete with their better-funded peers, my indefatigable cynicism suggests there must be someone out there who’s unhappy about this. Two kinds of people come immediately to mind: journal publishers and some scientists at highfalutin universities like Stanford.

Titles like NatureCellNew England Journal of Medicine and Science, and especially those published by the Elsevier group, have ridden the impact factor (IF) wave to great profit through many decades. In fact, IF continues to be the dominant mode of evaluation of research quality because it’s easy and not time-consuming, so – given how IF is defined – these journals continue to be important for being important. They also provide a valuable service – the double-blind peer review, which Mansur thinks is the only thing preprints are currently lacking in. But other than that (and with post-publication peer-review being largely suitable), their time of obscene profits is surely running out.

The pro-preprint trend in scientific publishing is also bound to have jolted some scientists whose work received a leg-up by virtue of their membership in elite faculty groups. Like Mansur says, a scientist from Stanford or a similar institution can no longer claim primacy, or uniqueness, by default. As a result, preprints definitely improve the forecast for good scientists working at less-regarded institutions – but an equally important consideration would be whether preprints also diminish the lure of fancy universities. They do have one less thing to offer now, or at least in the future.

English as the currency of science's practice

K. VijayRaghavan, the secretary of India’s Department of Biotechnology, has written a good piece in Hindustan Times about how India must shed its “intellectual colonialism” to excel at science and tech – particularly by shedding its obsession with the English language. This, as you might notice, parallels a post I wrote recently about how English plays an overbearing role in our lives, and particularly in the lives of scientists, because it remains a language many Indians don’t have to access to get through their days. Having worked closely with the government in drafting and implementing many policies related to the conduct and funding of scientific research in the country, VijayRaghavan is able to take a more fine-grained look at what needs changing and whether that’s possible. Most hearteningly, he says it is – only if we had the will to change. As he writes:

Currently, the bulk of our college education in science and technology is notionally in English whereas the bulk of our high-school education is in the local language. Science courses in college are thus accessible largely to the urban population and even when this happens, education is effectively neither of quality in English nor communicated as translations of quality in the classroom. Starting with the Kendriya Vidyalayas and the Nayodya Vidyalayas as test-arenas, we can ensure the training of teachers so that students in high-school are simultaneously taught in both their native language and in English. This already happens informally, but it needs formalisation. The student should be free to take exams in either language or indeed use a free-flowing mix. This approach should be steadily ramped up and used in all our best educational institutions in college and then scaled to be used more widely. Public and private colleges, in STEM subjects for example, can lead and make bi-lingual professional education attractive and economically viable.

Apart from helping students become more knowledgeable about the world through a language of their choice (for the execution of which many logistical barriers spring to mind, not the least of which is finding teachers), it’s also important to fund academic journals that allow these students to express their research in their language of choice. Without this component, they will be forced to fallback to the use of English, which is bound to be counterproductive to the whole enterprise. This form of change will require material resources as well as a shift in perspective that could be harder to attain. Additionally, as VijayRaghavan mentions, there also need to be good quality translation services for research in one language to be expressed in another so that cross-disciplinary and/or cross-linguistic tie-ups are not hampered.

Featured image credit: skeeze/pixabay.

English as the currency of science’s practice

K. VijayRaghavan, the secretary of India’s Department of Biotechnology, has written a good piece in Hindustan Times about how India must shed its “intellectual colonialism” to excel at science and tech – particularly by shedding its obsession with the English language. This, as you might notice, parallels a post I wrote recently about how English plays an overbearing role in our lives, and particularly in the lives of scientists, because it remains a language many Indians don’t have to access to get through their days. Having worked closely with the government in drafting and implementing many policies related to the conduct and funding of scientific research in the country, VijayRaghavan is able to take a more fine-grained look at what needs changing and whether that’s possible. Most hearteningly, he says it is – only if we had the will to change. As he writes:

Currently, the bulk of our college education in science and technology is notionally in English whereas the bulk of our high-school education is in the local language. Science courses in college are thus accessible largely to the urban population and even when this happens, education is effectively neither of quality in English nor communicated as translations of quality in the classroom. Starting with the Kendriya Vidyalayas and the Nayodya Vidyalayas as test-arenas, we can ensure the training of teachers so that students in high-school are simultaneously taught in both their native language and in English. This already happens informally, but it needs formalisation. The student should be free to take exams in either language or indeed use a free-flowing mix. This approach should be steadily ramped up and used in all our best educational institutions in college and then scaled to be used more widely. Public and private colleges, in STEM subjects for example, can lead and make bi-lingual professional education attractive and economically viable.

Apart from helping students become more knowledgeable about the world through a language of their choice (for the execution of which many logistical barriers spring to mind, not the least of which is finding teachers), it’s also important to fund academic journals that allow these students to express their research in their language of choice. Without this component, they will be forced to fallback to the use of English, which is bound to be counterproductive to the whole enterprise. This form of change will require material resources as well as a shift in perspective that could be harder to attain. Additionally, as VijayRaghavan mentions, there also need to be good quality translation services for research in one language to be expressed in another so that cross-disciplinary and/or cross-linguistic tie-ups are not hampered.

Featured image credit: skeeze/pixabay.

The language and bullshitness of 'a nearly unreadable paper'

Earlier today, the Retraction Watch mailing list highlighted a strange paper written by a V.M. Das disputing the widely accepted fact that our body clocks are regulated by the gene-level circadian rhythm. The paper is utter bullshit. Sample its breathless title: ‘Nobel Prize Physiology 2017 (for their discoveries of molecular mechanisms controlling the circadian rhythm) is On Fiction as There Is No Molecular Mechanisms of Biological Clock Controlling the Circadian Rhythm. Circadian Rhythm Is Triggered and Controlled By Divine Mechanism (CCP – Time Mindness (TM) Real Biological Clock) in Life Sciences’.

The use of language here is interesting. Retraction Watch called the paper ‘unreadable’ in the headline of its post because that’s obviously a standout feature of this paper. I’m not sure why Retraction Watch is highlighting nonsense papers on its pages – watched by thousands every day for intriguing retraction reports informed by the reporting of its staff – but I’m going to assume its editors want to help all their readers set up their own bullshit filters. And the best way to do this, as I’ve written before, is to invite readers to participate in understanding why something is bullshit.

However, to what extent do we think unreadability is a bullshit indicator? And from whose perspective?

There’s no exonerating the ‘time mindness’ paper because those who get beyond the language are able to see that it’s simply not even wrong. But if you had judged it only by its language, you would’ve landed yourself in murky waters. In fact, no paper should be judged by how it exercises the grammar of the language its authors have decided to write it in. Two reasons:

1. English is not the first language for most of India. Those who’ve been able to afford an English-centred education growing up or hail from English-fluent families (or both) are fine with the language but I remember most of my college professors preferring Hindi in the classroom. And I assume that’s the picture in most universities, colleges and schools around the country. You only need access to English if you’ve also had the opportunity to afford a certain lifestyle (cosmopolitan, e.g.).

2. There are not enough good journals publishing in vernacular languages in India – at least not that I know of. The ‘best’ is automatically the one in English, among other factors. Even the government thinks so. Earlier this year, the University Grants Commission published a ‘preferred’ list of journals; only papers published herein were to be considered for career advancement evaluations. The list left out most major local-language publications.

Now, imagine the scientific vocabulary of a researcher who prefers Hindi over English, for example, because of her educational upbringing as well as to teach within the classroom. Wouldn’t it be composed of Latin and English jargon suspended from Hindi adjectives and verbs, a web of Hindi-speaking sensibilities straining to sound like a scientist? Oh, that recalls a third issue:

3. Scientific papers are becoming increasingly hard to read, with many scientists choosing to actively include words they wouldn’t use around the dinner table because they like how the ‘sciencese’ sounds. In time, to write like this becomes fashionable – and to not write like this becomes a sign of complacency, disinterest or disingenuousness.

… to the mounting detriment of those who are not familiar with even colloquial English in the first place. To sum up: if a paper shows other, more ‘proper’ signs of bullshit, then it is bullshit no matter how much its author struggled to write it. On the other hand, a paper can’t be suspected of badness if its language is off – nor can it be called bad as such if that’s all is off about it.

This post was composed entirely on a smartphone. Please excuse typos or minor formatting issues.

The language and bullshitness of ‘a nearly unreadable paper’

Earlier today, the Retraction Watch mailing list highlighted a strange paper written by a V.M. Das disputing the widely accepted fact that our body clocks are regulated by the gene-level circadian rhythm. The paper is utter bullshit. Sample its breathless title: ‘Nobel Prize Physiology 2017 (for their discoveries of molecular mechanisms controlling the circadian rhythm) is On Fiction as There Is No Molecular Mechanisms of Biological Clock Controlling the Circadian Rhythm. Circadian Rhythm Is Triggered and Controlled By Divine Mechanism (CCP – Time Mindness (TM) Real Biological Clock) in Life Sciences’.

The use of language here is interesting. Retraction Watch called the paper ‘unreadable’ in the headline of its post because that’s obviously a standout feature of this paper. I’m not sure why Retraction Watch is highlighting nonsense papers on its pages – watched by thousands every day for intriguing retraction reports informed by the reporting of its staff – but I’m going to assume its editors want to help all their readers set up their own bullshit filters. And the best way to do this, as I’ve written before, is to invite readers to participate in understanding why something is bullshit.

However, to what extent do we think unreadability is a bullshit indicator? And from whose perspective?

There’s no exonerating the ‘time mindness’ paper because those who get beyond the language are able to see that it’s simply not even wrong. But if you had judged it only by its language, you would’ve landed yourself in murky waters. In fact, no paper should be judged by how it exercises the grammar of the language its authors have decided to write it in. Two reasons:

1. English is not the first language for most of India. Those who’ve been able to afford an English-centred education growing up or hail from English-fluent families (or both) are fine with the language but I remember most of my college professors preferring Hindi in the classroom. And I assume that’s the picture in most universities, colleges and schools around the country. You only need access to English if you’ve also had the opportunity to afford a certain lifestyle (cosmopolitan, e.g.).

2. There are not enough good journals publishing in vernacular languages in India – at least not that I know of. The ‘best’ is automatically the one in English, among other factors. Even the government thinks so. Earlier this year, the University Grants Commission published a ‘preferred’ list of journals; only papers published herein were to be considered for career advancement evaluations. The list left out most major local-language publications.

Now, imagine the scientific vocabulary of a researcher who prefers Hindi over English, for example, because of her educational upbringing as well as to teach within the classroom. Wouldn’t it be composed of Latin and English jargon suspended from Hindi adjectives and verbs, a web of Hindi-speaking sensibilities straining to sound like a scientist? Oh, that recalls a third issue:

3. Scientific papers are becoming increasingly hard to read, with many scientists choosing to actively include words they wouldn’t use around the dinner table because they like how the ‘sciencese’ sounds. In time, to write like this becomes fashionable – and to not write like this becomes a sign of complacency, disinterest or disingenuousness.

… to the mounting detriment of those who are not familiar with even colloquial English in the first place. To sum up: if a paper shows other, more ‘proper’ signs of bullshit, then it is bullshit no matter how much its author struggled to write it. On the other hand, a paper can’t be suspected of badness if its language is off – nor can it be called bad as such if that’s all is off about it.

This post was composed entirely on a smartphone. Please excuse typos or minor formatting issues.

A review at a National Institute of Health (US) evaluates a grant proposal.

A conference's peer-review was found to be sort of random, but whose fault is it?

It’s not a good time for peer-review. Sure, if you’ve been a regular reader of Retraction Watch, it’s never been a good time for peer-review. But aside from that, the process has increasingly been taking the brunt for not being able to stem the publishing of results that – after publication – have been found to be the product of bad research practices.

The problem may be that the reviewers are letting the ‘bad’ papers through but the bigger issue is that, while the system itself has been shown to have many flaws – not excluding personal biases – journals rely on the reviewers and naught else to stamp accepted papers with their approval. And some of those stamps, especially from Nature or Science, are weighty indeed. Now add to this muddle the NIPS wrangle, where researchers may have found that some peer-reviews are just arbitrary.

NIPS stands for the Neural Information Processing Systems (Foundation), whose annual conference was held in the second week of December 2014, in Montreal. It’s considered one of the few main conferences in the field of machine-learning. Around the time, two attendees – Corinna Cortes and Neil Lawrence – performed an experiment to judge how arbitrary the conference’s peer-review could get.

Their modus operandi was simple. All the papers submitted to the conference were peer-reviewed before they were accepted. Cortes and Lawrence then routed a tenth of all submitted papers through a second peer-review stage, and observed which papers were accepted or rejected in the second stage (According to Eric Price, NIPS ultimately accepted a paper if either group of reviewers accepted it). Their findings were distressing.

About 57%* of all papers accepted in the first review were rejected during the second review. To be sure, each stage of the review was presumably equally competent – it wasn’t as if the second stage was more stringent than the first. That said, 57% is a very big number. More than five times out of 10, peer-reviewers disagreed on what could be published. In other words, in an alternate universe, the same conference but with only the second group of reviewers in place was generating different knowledge.

Lawrence was also able to eliminate a possibly redeeming confounding factor, which he described in a Facebook discussion on this experiment:

… we had a look through the split decisions and didn’t find an example where the reject decision had found a ‘critical error’ that was missed by the accept. It seems that there is quite a lot of subjectivity in these things, which I suppose isn’t that surprising.

It doesn’t bode well that the NIPS conference is held in some esteem among its attendees for having one of the better reviewing processes. Including the 90% of the papers that did not go through a second peer-review, the total predetermined acceptance rate was 22%, i.e. reviewers were tasked with accepting 22 papers out of every 100 submitted. Put another way, the reviewers were rejecting 78%. And this sheds light on the more troubling perspective of their actions.

If the reviewers had been randomly rejecting a paper, they would’ve done so at the tasked rate of 78%. At NIPS, one can only hope that they weren’t – so the second group was purposefully rejecting 57% of the papers that the first group had accepted. In an absolutely non-random, logical world, this number should have been 0%. So, that 57% is closer to 78% than is 0% implies some of the rejection was random. Hmm.

While this is definitely cause for concern, forging ahead on the basis of arbitrariness – which machine-learning theorist John Langford defines as the probability that the second group rejects a paper that the first group has accepted – wouldn’t be the right way to go about it. This is similar to the case with A/B-testing: we have a test whose outcome can be used to inform our consequent actions, but using the test itself as a basis for the solution wouldn’t be right. For example, the arbitrariness can be reduced to 0% simply by having both groups accept every nth paper – a meaningless exercise.

Is our goal to reduce the arbitrariness to 0% at all? You’d say ‘Yes’, but consider the volume of papers being submitted to important conferences like NIPS and the number of reviewer-hours being available to evaluate them. In the history of conferences, surely some judgments must have been arbitrary for the reviewer to have fulfilled his/her responsibilities to his/her employer. So you see the bigger issue: it’s not all the reviewer as much as it’s also the so-called system that’s flawed.

Langford’s piece raises a similarly confounding topic:

Perhaps this means that NIPS is a very broad conference with substantial disagreement by reviewers (and attendees) about what is important? Maybe. This even seems plausible to me, given anecdotal personal experience. Perhaps small highly-focused conferences have a smaller arbitrariness?

Problems like these are necessarily difficult to solve because of the number of players involved. In fact, it wouldn’t be entirely surprising if we found that nobody or no institution was at fault except how they were all interacting with each other, and not just in fields like machine-learning. A study conducted in January 2015 found that minor biases during peer-review could result in massive changes in funding outcomes if the acceptance rate was low – such as with the annual awarding of grants by the National Institutes of Health. Even Nature is wary about the ability of its double-blind peer-review to solve the problems ailing normal ‘peer-review’.

Perhaps for the near future, the only takeaway is likely going to be that ambitious young scientists are going to have to remember that, first, acceptance – just as well as rejection – can be arbitrary and, second, that the impact factor isn’t everything. On the other hand, it doesn’t seem possible in the interim to keep from lowering our expectations of peer-reviewing itself.

*The number of papers routed to the second group after the first was 166. The overall disagreement rate was 26%, so they would have disagreed on the fates of 43. And because they were tasked with accepting 22% – which is 37 or 38 – group 1 could be said to have accepted 21 that group 2 rejected, and group 2 could be said to have accepted 22 that group 1 rejected. Between 21/37 (56.7%) and 22/38 (57.8%) is 57%.

Hat-tip: Akshat Rathi.

A review at a National Institute of Health (US) evaluates a grant proposal.

A conference’s peer-review was found to be sort of random, but whose fault is it?

It’s not a good time for peer-review. Sure, if you’ve been a regular reader of Retraction Watch, it’s never been a good time for peer-review. But aside from that, the process has increasingly been taking the brunt for not being able to stem the publishing of results that – after publication – have been found to be the product of bad research practices.

The problem may be that the reviewers are letting the ‘bad’ papers through but the bigger issue is that, while the system itself has been shown to have many flaws – not excluding personal biases – journals rely on the reviewers and naught else to stamp accepted papers with their approval. And some of those stamps, especially from Nature or Science, are weighty indeed. Now add to this muddle the NIPS wrangle, where researchers may have found that some peer-reviews are just arbitrary.

NIPS stands for the Neural Information Processing Systems (Foundation), whose annual conference was held in the second week of December 2014, in Montreal. It’s considered one of the few main conferences in the field of machine-learning. Around the time, two attendees – Corinna Cortes and Neil Lawrence – performed an experiment to judge how arbitrary the conference’s peer-review could get.

Their modus operandi was simple. All the papers submitted to the conference were peer-reviewed before they were accepted. Cortes and Lawrence then routed a tenth of all submitted papers through a second peer-review stage, and observed which papers were accepted or rejected in the second stage (According to Eric Price, NIPS ultimately accepted a paper if either group of reviewers accepted it). Their findings were distressing.

About 57%* of all papers accepted in the first review were rejected during the second review. To be sure, each stage of the review was presumably equally competent – it wasn’t as if the second stage was more stringent than the first. That said, 57% is a very big number. More than five times out of 10, peer-reviewers disagreed on what could be published. In other words, in an alternate universe, the same conference but with only the second group of reviewers in place was generating different knowledge.

Lawrence was also able to eliminate a possibly redeeming confounding factor, which he described in a Facebook discussion on this experiment:

… we had a look through the split decisions and didn’t find an example where the reject decision had found a ‘critical error’ that was missed by the accept. It seems that there is quite a lot of subjectivity in these things, which I suppose isn’t that surprising.

It doesn’t bode well that the NIPS conference is held in some esteem among its attendees for having one of the better reviewing processes. Including the 90% of the papers that did not go through a second peer-review, the total predetermined acceptance rate was 22%, i.e. reviewers were tasked with accepting 22 papers out of every 100 submitted. Put another way, the reviewers were rejecting 78%. And this sheds light on the more troubling perspective of their actions.

If the reviewers had been randomly rejecting a paper, they would’ve done so at the tasked rate of 78%. At NIPS, one can only hope that they weren’t – so the second group was purposefully rejecting 57% of the papers that the first group had accepted. In an absolutely non-random, logical world, this number should have been 0%. So, that 57% is closer to 78% than is 0% implies some of the rejection was random. Hmm.

While this is definitely cause for concern, forging ahead on the basis of arbitrariness – which machine-learning theorist John Langford defines as the probability that the second group rejects a paper that the first group has accepted – wouldn’t be the right way to go about it. This is similar to the case with A/B-testing: we have a test whose outcome can be used to inform our consequent actions, but using the test itself as a basis for the solution wouldn’t be right. For example, the arbitrariness can be reduced to 0% simply by having both groups accept every nth paper – a meaningless exercise.

Is our goal to reduce the arbitrariness to 0% at all? You’d say ‘Yes’, but consider the volume of papers being submitted to important conferences like NIPS and the number of reviewer-hours being available to evaluate them. In the history of conferences, surely some judgments must have been arbitrary for the reviewer to have fulfilled his/her responsibilities to his/her employer. So you see the bigger issue: it’s not all the reviewer as much as it’s also the so-called system that’s flawed.

Langford’s piece raises a similarly confounding topic:

Perhaps this means that NIPS is a very broad conference with substantial disagreement by reviewers (and attendees) about what is important? Maybe. This even seems plausible to me, given anecdotal personal experience. Perhaps small highly-focused conferences have a smaller arbitrariness?

Problems like these are necessarily difficult to solve because of the number of players involved. In fact, it wouldn’t be entirely surprising if we found that nobody or no institution was at fault except how they were all interacting with each other, and not just in fields like machine-learning. A study conducted in January 2015 found that minor biases during peer-review could result in massive changes in funding outcomes if the acceptance rate was low – such as with the annual awarding of grants by the National Institutes of Health. Even Nature is wary about the ability of its double-blind peer-review to solve the problems ailing normal ‘peer-review’.

Perhaps for the near future, the only takeaway is likely going to be that ambitious young scientists are going to have to remember that, first, acceptance – just as well as rejection – can be arbitrary and, second, that the impact factor isn’t everything. On the other hand, it doesn’t seem possible in the interim to keep from lowering our expectations of peer-reviewing itself.

*The number of papers routed to the second group after the first was 166. The overall disagreement rate was 26%, so they would have disagreed on the fates of 43. And because they were tasked with accepting 22% – which is 37 or 38 – group 1 could be said to have accepted 21 that group 2 rejected, and group 2 could be said to have accepted 22 that group 1 rejected. Between 21/37 (56.7%) and 22/38 (57.8%) is 57%.

Hat-tip: Akshat Rathi.

Some research misconduct trends by the numbers

A study published in eLIFE on August 14, 2014, looked at data pertaining to some papers published between 1992 and 2012 that the Office of Research Integrity had determined contained research misconduct. From the abstract:

Data relating to retracted manuscripts and authors found by the Office of Research Integrity (ORI) to have committed misconduct were reviewed from public databases. Attributable costs of retracted manuscripts, and publication output and funding of researchers found to have committed misconduct were determined. We found that papers retracted due to misconduct accounted for approximately $58 million in direct funding by the NIH between 1992 and 2012, less than 1% of the NIH budget over this period. Each of these articles accounted for a mean of $392,582 in direct costs (SD $423,256). Researchers experienced a median 91.8% decrease in publication output and large declines in funding after censure by the ORI.

While the number of retractions worldwide is on the rise – also because the numbers of papers being published and of journals are on the rise – the study addresses a subset of these papers and only those drawn up by researchers who received funding from the National Institutes of Health (NIH).

pubsfreq

Among them, there is no discernible trend in terms of impact factors and attributable losses. In the chart below, the size of each datapoint corresponds to the direct attributable loss and its color, to the impact factor of the journal that published the paper.

tabpublic 15-08-2014 100128

However, is the time to retraction dropping?

The maximum time to retraction has been on the decline since 1997. However, on average, the time to retraction is still fluctuating, influenced as it is by the number of papers retracted and the nature of misconduct.

trendTimeToRetr

No matter the time to retraction or the impact factors of the journals, most scientists experience a significant difference in funding before and after the ORI report comes through, as the chart below shows, sorted by quanta of funds. The right axis displays total funding pre-ORI and the left, total funding post-ORI.

prepostfund

As the study’s authors summarize in their abstract: “Researchers experienced a median 91.8% decrease in publication output and large declines in funding after censure by the ORI,” while total funding toward all implicated researchers went from $131 million to $74.5 million.

There could be some correlation between the type of misconduct and decline in funding, but there’s not enough data to determine that. Nonetheless, there are eight instances in 1992-2012 when the amount of funding increased after the ORI report, of which the lowest rise as such as is seen for John Ho, who committed fraud, and the highest for Alan Landay, implicated for plagiarism, a ‘lesser’ charge.

incfundFrom the paper:

The personal consequences for individuals found to have committed research misconduct are considerable. When a researcher is found by the ORI to have committed misconduct, the outcome typically involves a voluntary agreement in which the scientist agrees not to contract with the United States government for a period of time ranging from a few years to, in rare cases, a lifetime. Recent studies of faculty and postdoctoral fellows indicate that research productivity declines after censure by the ORI, sometimes to zero, but that many of those who commit misconduct are able to find new jobs within academia (Redman and Merz, 2008, 2013). Our study has found similar results. Censure by the ORI usually results in a severe decrease in productivity, in many cases causing a permanent cessation of publication. However the exceptions are instructive.

Retraction Watch reported the findings with especial focus on the cost of research misconduct. They spoke to Daniele Fanelli, one part of whose quote is notable – albeit no less than the rest.

The question of collateral damage, by which I mean the added costs caused by other research being misled, is controversial. It still has to be conclusively shown, in other words, that much research actually goes wasted directly because of fabricated findings. Waste is everywhere in science, but the role played by frauds in generating it is far from established and is likely to be minor.

References

Stern, A.M., Casadevall, A., Steen, R.G. and Fang, F.C., Financial costs and personal consequences of research misconduct resulting in retracted publications, eLIFE. August 14, 2014;3:e02956.

Plagiarism is plagiarism

In a Nature article, Praveen Chaddah argues that textual plagiarism entails that the offending paper only carry a correction and not be retracted because that makes the useful ideas and results in the paper unavailable. On the face of it, this is an argument that draws a distinction between the writing of a paper and the production of its technical contents.

Chaddah proposes to preserve the distinction for the benefit of science by punishing plagiarists only for what they plagiarized. If they pinched text, then issue a correction and apology but let the results stay. If they pinched the hypothesis or results, then retract the paper. He thinks this line of thought is justifiable because, this way, one does not retard the introduction of new ideas into the pool of knowledge, because it does not harm the notion of “research as a creative enterprise” for as long as the hypothesis, method and/or results are original.

I disagree. Textual plagiarism is also the violation of an important creative enterprise that, in fact, has become increasingly relevant to science today: communication. Scientists have to use communication effectively to convince people that their research deserves tax-money. Scientists have to use communication effectively to make their jargon understandable to others. Plagiarizing the ‘descriptive’ part of papers, in this context, is to disregard the importance of communication, and copying the communicative bits should be tantamount to copying the results, too.

He goes on to argue that if textual plagiarism has been detected but if the hypothesis/results are original, the latter must be allowed to stand. His hypothesis appears to assume that scientific journals are the same as specialist forums that prioritize results over a full package: introduction, formulation, description, results, discussion, conclusion, etc. Scientific journals are not just the “guarantors of the citizen’s trust in science” (The Guardian) but also resources that people like journalists, analysts and policy-makers use to understand the extent of the guarantee.

What journalist doesn’t appreciate a scientist who’s able to articulate his/her research well, much less patronizing the publicity it will bring him/her?

In September 2013, the journal PLoS ONE retracted a paper by a group of Indian authors for textual plagiarism. This incident exemplifies a disturbing attitude toward plagiarism. One of the authors of the paper, Ram Dhaked, complained that it was the duty of PLoS ONE to detect their plagiarism before publishing it, glibly abdicating his guilt.

Like Chaddah argues, authors of a paper could be plagiarizing text for a variety of reasons – but somehow they believe lifting chunks of text from other papers during the paper-production process is allowable or will go unchecked. As an alternative to this, publishers could consider – or might already be considering – the ethics of ghost-writing.

He finally posits that papers with plagiarized text should be made available along with the correction, too. That would increase the visibility of the offense and over time, presumably, shame scientists into not plagiarizing – but that’s not the point. The point is to get scientists to understand why it is important to think about what they’ve done and communicate their thoughts. That journals retract both the text and the results if only the text was plagiarized is an important way to reinforce that point. If anything, Chaddah’s contention could have been to reduce the implications of having a retraction against one’s bio.