A trumpet for Ramdev

The Print published an article entitled ‘Ramdev’s Patanjali does a ‘first’, its Sanskrit paper makes it to international journal’ on February 5, 2020. Excerpt:

In a first, international science journal MDPI has published a research paper in the Sanskrit language. Yoga guru Baba Ramdev’s FMCG firm Patanjali Ayurveda had submitted the paper. Switzerland’s Basel-based MDPI … published a paper in Sanskrit for the first time. Biomolecules, one of the peer-reviewed journals under MDPI, has carried video abstracts of the paper on a medicinal herb, but with English subtitles. … The Patanjali research paper, published on 25 January in a special issue of the journal titled ‘Pharmacology of Medicinal Plants’, is on medicinal herb ‘Withania somnifera’, commonly known as ‘ashwagandha’.

This article is painfully flawed.

1. MDPI is a publisher, not a journal. It featured on Beall’s list (with the customary caveats) and has published some obviously problematic papers. I’ve heard good things about some of its titles and bad things about others. The journalist needed to have delineated this aspect instead of taking the simpler fact of publication in a journal at face value. Even then, qualifying a journal as “peer-reviewed” doesn’t cut it anymore. In a time when peer-review can be hacked (thanks to its relative opacity) and the whole publishing process subverted for profit, all journalists writing on matters of science – as opposed to just science journalists – need to perform their own checks to certify the genealogy of a published paper, especially if the name of the journal(s) and its exercise of peer-review are being employed in the narrative as markers of authority.

2. People want to publish research in English so others can discover and build on it. A paper written in Sanskrit is a gimmick. The journalist should have clarified this point instead of letting Ramdev’s minions (among the authors of the paper) claim brownie points for their feat. It’s a waste of effort, time and resources. More importantly The Print has conjured a virtue out of thin air and broadcast asinine claims like “This is the first step towards the acceptance of ‘Sanskrit language’ in the field of research among the international community.”

3. The article has zero critique of the paper’s findings, no independent comments and no information about the study’s experimental design. This is the sort of nonsense that an unquestioning commitment to objectivity in news allows: reporters can’t just write someone said something if what they said is wrong, misleading, harmful or all three. Magnifying potentially indefensible claims relating to scientific knowledge – or knowledge that desires the authority of science’s approval – without contextualising them and fact-checking them if necessary may be objective but it is also a public bad. It pays to work with the assumption (even when it doesn’t apply) that at least 50% of your readers don’t know better. That way, even if 1% (an extremely conservative estimate for audiences in India) doesn’t know better, which can easily run into the thousands, you avoid misinforming them by not communicating enough.

4. A worryingly tendentious statement appears in the middle of the piece: “The study proves that WS seeds help reduce psoriasis,” the journalist writes, without presenting any evidence that she checked. It seems possible that the journalist believes she is simply reporting the occurrence of a localised event – in the form of the context-limited proof published in a paper – without acknowledging that the act of proving a hypothesis is a process, not an event, in that it is ongoing. This character is somewhat agnostic of the certainty of the experiment’s conclusions as well: even if one scientist has established with 100% confidence that the experiment they designed has sustained their hypothesis and published their results in a legitimate preprint repository and/or a journal, other scientists will need to replicate the test and even others are likely to have questions they’ll need answered.

5. The experiment was conducted in mice, not humans. Cf. @justsaysinmice

6. “‘We will definitely monetise the findings. We will be using the findings to launch our own products under the cosmetics and medicine category,’ Acharya [the lead author] told ThePrint.” It’s worrying to discover that the authors of the paper, and Baba Ramdev, who funded them, plan to market a product based on just one study, in mice, in a possibly questionable paper, without any independent comments about the findings’ robustness or tenability, to many humans who may not know better. But the journalist hasn’t pressed Acharya or any of the other authors on questions about the experiment or their attempt to grab eyeballs by writing and speaking in Sanskrit, or on how they plan to convince the FSSAI to certify a product for humans based on a study in mice.

The scientist as inadvertent loser

Twice this week, I’d had occasion to write about how science is an immutably human enterprise and therefore some of its loftier ideals are aspirational at best, and about how transparency is one of the chief USPs of preprint repositories and post-publication peer-review. As if on cue, I stumbled upon a strange case of extreme scientific malpractice that offered to hold up both points of view.

In an article published January 30, three editors of the Journal of Theoretical Biology (JTB) reported that one of their handling editors had engaged in the following acts:

  1. “At the first stage of the submission process, the Handling Editor on multiple occasions handled papers for which there was a potential conflict of interest. This conflict consisted of the Handling Editor handling papers of close colleagues at the Handling Editor’s own institute, which is contrary to journal policies.”
  2. “At the second stage of the submission process when reviewers are chosen, the Handling Editor on multiple occasions selected reviewers who, through our investigation, we discovered was the Handling Editor working under a pseudonym…”
  3. Many forms of reviewer coercion
  4. “In many cases, the Handling Editor was added as a co-author at the final stage of the review process, which again is contrary to journal policies.”

On the back of these acts of manipulation, this individual – whom the editors chose not to name for unknown reasons but one of whom all but identified on Twitter as a Kuo-Chen Chou (and backed up by an independent user) – proudly trumpets the following ‘achievement’ on his website:

The same webpage also declares that Chou “has published over 730 peer-reviewed scientific papers” and that “his papers have been cited more than 71,041 times”.

Without transparencya and without the right incentives, the scientific process – which I use loosely to denote all activities and decisions associated with synthesising, assimilating and organising scientific knowledge – becomes just as conducive to misconduct and unscrupulousness as any other enterprise if only because it allows people with even a little more power to exploit others’ relative powerlessness.

a. Ironically, the JTB article lies behind a paywall.

In fact, Chen had also been found guilty of similar practices when working with a different journal, called Bioinformatics, and an article its editors published last year has been cited prominently in the article by JTB’s editors.

Even if the JTB and Bioinformatics cases are exceptional for their editors having failed to weed out gross misconduct shortly after its first occurrence – it’s not; but although there many such exceptional cases, they are still likely to be in the minority (an assumption on my part) – a completely transparent review process eliminates such possibilities as well as, and more importantly, naturally renders the process trustlessb. That is, you shouldn’t have to trust a reviewer to do right by your paper; the system itself should be designed such that there is no opportunity for a reviewer to do wrong.

b. As in trustlessness, not untrustworthiness.

Second, it seems Chou accrued over 71,000 citations because the number of citations has become a proxy for research excellence irrespective of whether the underlying research is actually excellent – a product of the unavoidable growth of a system in which evaluators replaced a complex combination of factors with a single number. As a result, Chou and others like him have been able to ‘hack’ the system, so to speak, and distort the scientific literature (which you might’ve seen as the stack of journals in a library representing troves of scientific knowledge).

But as long as the science is fine, no harm done, right? Wrong.

If you visualised the various authors of research papers as points and the lines connecting them to each other as citations, an inordinate number would converge on the point of Chou – and they would be wrong, led there not by Chou’s prowess as a scientist but misled there by his abilities as a credit-thief and extortionist.

This graphing exercise isn’t simply a form of visual communication. Imagine your life as a scientist as a series of opportunities, where each opportunity is contested by multiple people and the people in charge of deciding who ‘wins’ at each stage aren’t some or all of well-trained, well-compensated or well-supported. If X ‘loses’ at one of the early stages and Y ‘wins’, Y has a commensurately greater chance of winning a subsequent contest and X, lower. Such contests often determine the level of funding, access to suitable guidance and even networking possibilities, so over multiple rounds, by virtue of the evaluators at each step having more reasons to be impressed by Y‘s CV because, say, they had more citations, and fewer reasons to be impressed with X‘s, X ends up with more reasons to exit science and switch careers.

Additionally, because of the resources that Y has received opportunities to amass, they’re in a better position to conduct even more research, ascend to even more influential positions and – if they’re so inclined – accrue even more citations through means both straightforward and dubious. To me, such prejudicial biasing resembles the evolution of a Lorenz attractor: the initial conditions might appear to be the same to some approximation, but for a single trivial choice, one scientist ends up being disproportionately more successful than another.

The answer of course is many things, including better ways to evaluate and reward research, and two of them in turn have to be to eliminate the use of numbers to denote human abilities and to make the journey of a manuscript from the lab to the wild as free of opaque, and therefore potentially arbitrary, decision-making as possible.

Featured image: A still from an animation showing the divergence of nearby trajectories on a Lorenz system. Caption and credit: MicoFilós/Wikimedia Commons, CC BY-SA 3.0.

The cycle

Is it just me or does everyone see a self-fulfilling prophecy here?

For a long time, and assisted ably by the ‘publish or perish’ paradigm, researchers sought to have their papers published in high-impact-factor journals – a.k.a. prestige journals – like Nature.

Such journals in turn, assisted ably by parasitic strategies, made these papers highly visible to other researchers around the world and, by virtue of being high-IF journals, tainted the results in the papers with a measure of prestige, ergo importance.

Evaluations and awards committees in turn were highly aware of these papers over others and picked their authors for rewards over others, further amplifying their work, increasing the opportunity cost incurred by the researchers who lose out, and increasing the prestige attached to the high-IF journals.

Run this cycle a few million times and you end up with the impression that there’s something journals like Nature get right – when in fact it’s just mostly a bunch of business practices to ensure they remain profitable.

Confused thoughts on embargoes

Seventy! That’s how many observatories around the world turned their antennae to study the neutron-star collision that LIGO first detected. So I don’t know why the LIGO Collaboration, and Nature, bothered to embargo the announcement and, more importantly, the scientific papers of the LIGO-Virgo collaboration as well as those by the people at all these observatories. That’s a lot of people and many of them leaked the neutron-star collision news on blogs and on Twitter. Madness. I even trawled through arΧiv to see if I could find preprint copies of the LIGO papers. Nope; it’s all been removed.

Embargoes create hype from which journals profit. Everyone knows this. Instead of dumping the data along with the scientific articles as soon as they’re ready, journals like Nature, Science and others announce that the information will all be available at a particular time on a particular date. And between this announcement and the moment at which the embargo lifts, the journal’s PR team fuels hype surrounding whatever’s being reported. This hype is important because it generates interest. And if the information promises to be good enough, the interest in turn creates ‘high pressure’ zones on the internet – populated by those people who want to know what’s going on.

Search engines and news aggregators like Google and Facebook are sensitive to the formation of these high-pressure zones and, at the time of the embargo’s lifting, watch out for news publications carrying the relevant information. And after the embargo lifts, thanks to the attention already devoted by the aggregators, news websites are transformed into ‘low pressure’ zones into which the aggregators divert all the traffic. It’s like the moment a giant information bubble goes pop! And the journal profits from all of this because, while the bubble was building, the journal’s name is everywhere.

In short: embargoes are a traffic-producing opportunity for news websites because they create ‘pseudo-cycles of news’, and an advertising opportunity for journals.

But what’s in it for someone reporting on the science itself? And what’s in it for the consumers? And, overall, am I being too vicious about the idea?

For science reporters, there’s the Ingelfinger rule promulgated by the New England Journal of Medicine in 1969. It states that the journal will not publish any papers with results that have been previously published elsewhere and/or whose authors have not discussed the results with the media. NEJM defended the rule by claiming it was to keep their output fresh and interesting as well as to prevent scientists from getting carried away by the implications of their own research (NEJM’s peer-review process would prevent that, they said). In the end, the consumers would receive scientific information that has been thoroughly vetted.

While the rule makes sense from the scientists’ point of view, it doesn’t from the reporters’. A good science reporter, having chosen to cover a certain paper, will present the paper to an expert unaffiliated with the authors and working in the same area for her judgment. This is a form of peer-review that is extraneous to the journal publishing the paper. Second: a pro-embargo argument that’s been advanced is that embargoes alert science reporters to papers of importance as well as give them time to write a good story on it.

I’m conflicted about this. Embargoes, and the attendant hype, do help science reporters pick up on a story they might’ve missed out on, to capitalise on the traffic potential of a new announcement that may not be as big as it becomes without the embargo. Case in point: today’s neutron-star collision announcement. At the same time, science reporters constantly pick up on interesting research that is considered old/stale or that wasn’t ever embargoed and write great stories about them. Case in point: almost everything else.

My perspective is coloured by the fact that I manage a very small science newsroom at The Wire. I have a very finite monthly budget (equal to about what someone working eight hours a day and five days a week would make in two months on the US minimum wage) using which I’ve to ensure that all my writers – who are all freelancers – provide both the big picture of science in that month as well as the important nitty-gritties. Embargoes, for me, are good news because it helps me reallocate human and financial resources for a story well in advance and make The Wire‘s presence felt on the big stage when the curtain lifts. Rather, even if I can’t make it on time to the moment the curtain lifts, I’ve still got what I know for sure is good story on my hands.

A similar point was made by Kent Anderson when he wrote about eLife‘s media policy, which said that the journal would not be enforcing the Ingelfinger rule, over at The Scholarly Kitchen:

By waiving the Ingelfinger rule in its modernised and evolved form – which still places a premium on embargoes but makes pre-publication communications allowable as long as they don’t threaten the news power – eLife is running a huge risk in the attention economy. Namely, there is only so much time and attention to go around, and if you don’t cut through the noise, you won’t get the attention. …

Like it or not, but press embargoes help journals, authors, sponsors, and institutions cut through the noise. Most reporters appreciate them because they level the playing field, provide time to report on complicated and novel science, and create an effective overall communication scenario for important science news. Without embargoes and coordinated media activity, interviews become more difficult to secure, complex stories may go uncovered because they’re too difficult to do well under deadline pressures, and coverage becomes more fragmented.

What would I be thinking if I had a bigger budget and many full-time reporters to work with? I don’t know.

On Embargo Watch in July this year, Ivan Oransky wrote about how an editor wasn’t pleased with embargoes because “staffers had been pulled off other stories to make sure to have this one ready by the original embargo”. I.e., embargoes create deadlines that are not in your control; they create deadlines within which everyone, over time, tends to do the bare minimum (“as much as other publications will do”) so they can ride the interest wave and move on to other things – sometimes not revisiting this story again even. In a separate post, Oransky briefly reviewed a book against embargoes by Vincent Kiernan, a noted critic of the idea:

In his book, Embargoed Science, Kiernan argues that embargoes make journalists lazy, always chasing that week’s big studies. They become addicted to the journal hit, afraid to divert their attention to more original and enterprising reporting because their editors will give them grief for not covering that study everyone else seems to have covered.

Alice Bell wrote a fantastic post in 2010 about how to overcome such tendencies: by newsrooms redistributing their attention on science to both upstream and downstream activities. But more than that, I don’t think lethargic news coverage can be explained solely by the addiction to embargoes. A good editor should keep stirring the pot – should keep her journalists moving on good stories, particularly of the kind no one wants to talk about, report on it and play it up. So, while I’m hoping that The Wire‘s coverage of the neutron-star collision discovery is a hit, I’ve also got great pieces coming this week about solar flares, open-access publishing, the health effects of ******** mining and the conservation of sea snakes.

I hope time will provide some clarity.

Featured image credit: Free-Photos/pixabay.

A conference's peer-review was found to be sort of random, but whose fault is it?

It’s not a good time for peer-review. Sure, if you’ve been a regular reader of Retraction Watch, it’s never been a good time for peer-review. But aside from that, the process has increasingly been taking the brunt for not being able to stem the publishing of results that – after publication – have been found to be the product of bad research practices.

The problem may be that the reviewers are letting the ‘bad’ papers through but the bigger issue is that, while the system itself has been shown to have many flaws – not excluding personal biases – journals rely on the reviewers and naught else to stamp accepted papers with their approval. And some of those stamps, especially from Nature or Science, are weighty indeed. Now add to this muddle the NIPS wrangle, where researchers may have found that some peer-reviews are just arbitrary.

NIPS stands for the Neural Information Processing Systems (Foundation), whose annual conference was held in the second week of December 2014, in Montreal. It’s considered one of the few main conferences in the field of machine-learning. Around the time, two attendees – Corinna Cortes and Neil Lawrence – performed an experiment to judge how arbitrary the conference’s peer-review could get.

Their modus operandi was simple. All the papers submitted to the conference were peer-reviewed before they were accepted. Cortes and Lawrence then routed a tenth of all submitted papers through a second peer-review stage, and observed which papers were accepted or rejected in the second stage (According to Eric Price, NIPS ultimately accepted a paper if either group of reviewers accepted it). Their findings were distressing.

About 57%* of all papers accepted in the first review were rejected during the second review. To be sure, each stage of the review was presumably equally competent – it wasn’t as if the second stage was more stringent than the first. That said, 57% is a very big number. More than five times out of 10, peer-reviewers disagreed on what could be published. In other words, in an alternate universe, the same conference but with only the second group of reviewers in place was generating different knowledge.

Lawrence was also able to eliminate a possibly redeeming confounding factor, which he described in a Facebook discussion on this experiment:

… we had a look through the split decisions and didn’t find an example where the reject decision had found a ‘critical error’ that was missed by the accept. It seems that there is quite a lot of subjectivity in these things, which I suppose isn’t that surprising.

It doesn’t bode well that the NIPS conference is held in some esteem among its attendees for having one of the better reviewing processes. Including the 90% of the papers that did not go through a second peer-review, the total predetermined acceptance rate was 22%, i.e. reviewers were tasked with accepting 22 papers out of every 100 submitted. Put another way, the reviewers were rejecting 78%. And this sheds light on the more troubling perspective of their actions.

If the reviewers had been randomly rejecting a paper, they would’ve done so at the tasked rate of 78%. At NIPS, one can only hope that they weren’t – so the second group was purposefully rejecting 57% of the papers that the first group had accepted. In an absolutely non-random, logical world, this number should have been 0%. So, that 57% is closer to 78% than is 0% implies some of the rejection was random. Hmm.

While this is definitely cause for concern, forging ahead on the basis of arbitrariness – which machine-learning theorist John Langford defines as the probability that the second group rejects a paper that the first group has accepted – wouldn’t be the right way to go about it. This is similar to the case with A/B-testing: we have a test whose outcome can be used to inform our consequent actions, but using the test itself as a basis for the solution wouldn’t be right. For example, the arbitrariness can be reduced to 0% simply by having both groups accept every nth paper – a meaningless exercise.

Is our goal to reduce the arbitrariness to 0% at all? You’d say ‘Yes’, but consider the volume of papers being submitted to important conferences like NIPS and the number of reviewer-hours being available to evaluate them. In the history of conferences, surely some judgments must have been arbitrary for the reviewer to have fulfilled his/her responsibilities to his/her employer. So you see the bigger issue: it’s not all the reviewer as much as it’s also the so-called system that’s flawed.

Langford’s piece raises a similarly confounding topic:

Perhaps this means that NIPS is a very broad conference with substantial disagreement by reviewers (and attendees) about what is important? Maybe. This even seems plausible to me, given anecdotal personal experience. Perhaps small highly-focused conferences have a smaller arbitrariness?

Problems like these are necessarily difficult to solve because of the number of players involved. In fact, it wouldn’t be entirely surprising if we found that nobody or no institution was at fault except how they were all interacting with each other, and not just in fields like machine-learning. A study conducted in January 2015 found that minor biases during peer-review could result in massive changes in funding outcomes if the acceptance rate was low – such as with the annual awarding of grants by the National Institutes of Health. Even Nature is wary about the ability of its double-blind peer-review to solve the problems ailing normal ‘peer-review’.

Perhaps for the near future, the only takeaway is likely going to be that ambitious young scientists are going to have to remember that, first, acceptance – just as well as rejection – can be arbitrary and, second, that the impact factor isn’t everything. On the other hand, it doesn’t seem possible in the interim to keep from lowering our expectations of peer-reviewing itself.

*The number of papers routed to the second group after the first was 166. The overall disagreement rate was 26%, so they would have disagreed on the fates of 43. And because they were tasked with accepting 22% – which is 37 or 38 – group 1 could be said to have accepted 21 that group 2 rejected, and group 2 could be said to have accepted 22 that group 1 rejected. Between 21/37 (56.7%) and 22/38 (57.8%) is 57%.

Hat-tip: Akshat Rathi.

A conference’s peer-review was found to be sort of random, but whose fault is it?

It’s not a good time for peer-review. Sure, if you’ve been a regular reader of Retraction Watch, it’s never been a good time for peer-review. But aside from that, the process has increasingly been taking the brunt for not being able to stem the publishing of results that – after publication – have been found to be the product of bad research practices.

The problem may be that the reviewers are letting the ‘bad’ papers through but the bigger issue is that, while the system itself has been shown to have many flaws – not excluding personal biases – journals rely on the reviewers and naught else to stamp accepted papers with their approval. And some of those stamps, especially from Nature or Science, are weighty indeed. Now add to this muddle the NIPS wrangle, where researchers may have found that some peer-reviews are just arbitrary.

NIPS stands for the Neural Information Processing Systems (Foundation), whose annual conference was held in the second week of December 2014, in Montreal. It’s considered one of the few main conferences in the field of machine-learning. Around the time, two attendees – Corinna Cortes and Neil Lawrence – performed an experiment to judge how arbitrary the conference’s peer-review could get.

Their modus operandi was simple. All the papers submitted to the conference were peer-reviewed before they were accepted. Cortes and Lawrence then routed a tenth of all submitted papers through a second peer-review stage, and observed which papers were accepted or rejected in the second stage (According to Eric Price, NIPS ultimately accepted a paper if either group of reviewers accepted it). Their findings were distressing.

About 57%* of all papers accepted in the first review were rejected during the second review. To be sure, each stage of the review was presumably equally competent – it wasn’t as if the second stage was more stringent than the first. That said, 57% is a very big number. More than five times out of 10, peer-reviewers disagreed on what could be published. In other words, in an alternate universe, the same conference but with only the second group of reviewers in place was generating different knowledge.

Lawrence was also able to eliminate a possibly redeeming confounding factor, which he described in a Facebook discussion on this experiment:

… we had a look through the split decisions and didn’t find an example where the reject decision had found a ‘critical error’ that was missed by the accept. It seems that there is quite a lot of subjectivity in these things, which I suppose isn’t that surprising.

It doesn’t bode well that the NIPS conference is held in some esteem among its attendees for having one of the better reviewing processes. Including the 90% of the papers that did not go through a second peer-review, the total predetermined acceptance rate was 22%, i.e. reviewers were tasked with accepting 22 papers out of every 100 submitted. Put another way, the reviewers were rejecting 78%. And this sheds light on the more troubling perspective of their actions.

If the reviewers had been randomly rejecting a paper, they would’ve done so at the tasked rate of 78%. At NIPS, one can only hope that they weren’t – so the second group was purposefully rejecting 57% of the papers that the first group had accepted. In an absolutely non-random, logical world, this number should have been 0%. So, that 57% is closer to 78% than is 0% implies some of the rejection was random. Hmm.

While this is definitely cause for concern, forging ahead on the basis of arbitrariness – which machine-learning theorist John Langford defines as the probability that the second group rejects a paper that the first group has accepted – wouldn’t be the right way to go about it. This is similar to the case with A/B-testing: we have a test whose outcome can be used to inform our consequent actions, but using the test itself as a basis for the solution wouldn’t be right. For example, the arbitrariness can be reduced to 0% simply by having both groups accept every nth paper – a meaningless exercise.

Is our goal to reduce the arbitrariness to 0% at all? You’d say ‘Yes’, but consider the volume of papers being submitted to important conferences like NIPS and the number of reviewer-hours being available to evaluate them. In the history of conferences, surely some judgments must have been arbitrary for the reviewer to have fulfilled his/her responsibilities to his/her employer. So you see the bigger issue: it’s not all the reviewer as much as it’s also the so-called system that’s flawed.

Langford’s piece raises a similarly confounding topic:

Perhaps this means that NIPS is a very broad conference with substantial disagreement by reviewers (and attendees) about what is important? Maybe. This even seems plausible to me, given anecdotal personal experience. Perhaps small highly-focused conferences have a smaller arbitrariness?

Problems like these are necessarily difficult to solve because of the number of players involved. In fact, it wouldn’t be entirely surprising if we found that nobody or no institution was at fault except how they were all interacting with each other, and not just in fields like machine-learning. A study conducted in January 2015 found that minor biases during peer-review could result in massive changes in funding outcomes if the acceptance rate was low – such as with the annual awarding of grants by the National Institutes of Health. Even Nature is wary about the ability of its double-blind peer-review to solve the problems ailing normal ‘peer-review’.

Perhaps for the near future, the only takeaway is likely going to be that ambitious young scientists are going to have to remember that, first, acceptance – just as well as rejection – can be arbitrary and, second, that the impact factor isn’t everything. On the other hand, it doesn’t seem possible in the interim to keep from lowering our expectations of peer-reviewing itself.

*The number of papers routed to the second group after the first was 166. The overall disagreement rate was 26%, so they would have disagreed on the fates of 43. And because they were tasked with accepting 22% – which is 37 or 38 – group 1 could be said to have accepted 21 that group 2 rejected, and group 2 could be said to have accepted 22 that group 1 rejected. Between 21/37 (56.7%) and 22/38 (57.8%) is 57%.

Hat-tip: Akshat Rathi.

R&D in China and India

“A great deal of the debate over globalization of knowledge economies has focused on China and India. One reason has been their rapid, sustained economic growth. The Chinese economy has averaged a growth rate of 9-10 percent for nearly two decades, and now ranks among the world’s largest economies. India, too, has grown steadily. After years of plodding along at an average annual increase in its gross domestic product (GDP) of 3.5 percent, India has expanded by 6 percent per annum since 1980, and more than 7 percent since 1994 (Wilson and Purushothaman, 2003). Both countries are expected to maintain their dynamism, at least for the near future.”

– Gereffi et al, ‘Getting the Numbers Right: International Engineering Education in the United States, China and India’, Journal of Engineering Education, January 2008

A June 16 paper in Proceedings of the National Academy of Sciences, titled ‘China’s Rise as a Major Contributor to Science and Technology’, analyses the academic and research environment in China over the last decade or so, and discusses the factors involved in the country’s increasing fecundity in recent years. It concludes that four factors have played an important role in this process:

  1. Large human capital base
  2. A labor market favoring academic meritocracy
  3. A large diaspora of Chinese-origin scientists
  4. A centralized government willing to invest in science

A simple metric they cite to make their point is the publication trends by country. Between 2000 and 2010, for example, the number of science and engineering papers published by China has increased by 470%. The next highest climb was for India, by 234%.

Click on the image for an interactive chart.
Click on the image for an interactive chart.

“The cheaters don’t have to worry they will someday be caught and punished.”

This is a quantitative result. A common criticism of the rising volume of Chinese scientific literature in the last three decades is the quality of research coming out of it. Dramatic increases in research output are often accompanied by a publish-or-perish mindset that fosters a desperation among scientists to get published, leading to padded CVs, falsified data and plagiarism. Moreover, it’s plausible that since R&D funding in China is still controlled by a highly centralized government, flow of money is restricted and access to it is highly competitive. And when it is government officials that are evaluating science, quantitative results are favored over qualitative ones, reliance on misleading performance metrics increases, and funds are often awarded for areas of research that favor political agendas.

The PNAS paper cites the work of Shi-min Fang, a science writer who won the inaugural John Maddox prize in 2012 for exposing scientific fraud in Chinese research circles, for this. In an interview to NewScientist in November of that year, he explains the source of widespread misconduct:

It is the result of interactions between totalitarianism, the lack of freedom of speech, press and academic research, extreme capitalism that tries to commercialise everything including science and education, traditional culture, the lack of scientific spirit, the culture of saving face and so on. It’s also because there is not a credible official channel to report, investigate and punish academic misconduct. The cheaters don’t have to worry they will someday be caught and punished.

At this point, it’s tempting to draw parallels with India. While China has seen increased funding for R&D…

Click on the chart for an interactive view.
Click on the chart for an interactive view.

… India has been less fortunate.

Click on the chart for an interactive view.
Click on the chart for an interactive view.

The issue of funding is slightly different in India, in fact. While Chinese science is obstinately centralized and publicly funded, India is centralized in some parts and decentralized in others, public funding is not high enough because presumably we lack the meritocratic academic environment, and private funding is not as high as it needs to be.

Click on the image for an interactive chart.
Click on the image for an interactive chart.

Even though the PNAS paper’s authors say their breakdown of what has driven scientific output from China could inspire changes in other countries, India is faced with different issues as the charts above have shown. Indeed, the very first chart shows how, despite the number of published papers having double in the last decade, we have only jumped from one small number to another small number.

“Scientific research in India has become the handmaiden of defense technology.”

There is also a definite lack of visibility: when little scientific output of any kind is accessible to 1) the common man, and 2) the world outside. Apart from minimal media coverage, there is a paucity of scientific journals, or they exist but are not well known, accessible or both. This Jamia Milia collection lists a paltry 226 journals – including those in regional languages – but it’s likelier that there are hundreds more, both credible and dubious. A journal serves as an aggregation of reliable scientific knowledge not just for scientists but also for journalists and other reliant decision-makers. It is one place to find the latest developments.

In this context, Current Science appears to be the most favored in the country, not to mention the loneliest. Then again, a couple fingers can be pointed at years of reliance on quantitative performance metrics, which drives many Indian researchers to publish in journals with very high impact factors such as Nature or Science, which are often based outside the country.

In the absence of lists of Indian and Chinese journals, let’s turn to a table used in the PNAS paper showing average number of citations per article compared with the USA, in percent. It shows both India and China close to 40% in 2010-2011.

The poor showing may not be a direct consequence of low quality. For example, a paper may have detailed research conducted to resolve a niche issue in Indian defense technology. In such a case, the quality of the article may be high but the citability of the research itself will be low. Don’t be surprised if this is common in India given our devotion to the space and nuclear sciences. And perhaps this is what a friend of mine referred to when he said “Scientific research in India has become the handmaiden of defense technology”.

To sum up, although India and China both lag the USA and the EU for productivity and value of research (albeit through quantitative metrics), China is facing problems associated with the maturity of a voluminous scientific workforce, whereas India is quite far from that maturity. The PNAS paper is available here. If you’re interested in an analysis of engineering education in the two countries, see this paper (from which the opening lines of this post were borrowed).

R&D in China and India

“A great deal of the debate over globalization of knowledge economies has focused on China and India. One reason has been their rapid, sustained economic growth. The Chinese economy has averaged a growth rate of 9-10 percent for nearly two decades, and now ranks among the world’s largest economies. India, too, has grown steadily. After years of plodding along at an average annual increase in its gross domestic product (GDP) of 3.5 percent, India has expanded by 6 percent per annum since 1980, and more than 7 percent since 1994 (Wilson and Purushothaman, 2003). Both countries are expected to maintain their dynamism, at least for the near future.”

– Gereffi et al, ‘Getting the Numbers Right: International Engineering Education in the United States, China and India’, Journal of Engineering Education, January 2008

A June 16 paper in Proceedings of the National Academy of Sciences, titled ‘China’s Rise as a Major Contributor to Science and Technology’, analyses the academic and research environment in China over the last decade or so, and discusses the factors involved in the country’s increasing fecundity in recent years. It concludes that four factors have played an important role in this process:

  1. Large human capital base
  2. A labor market favoring academic meritocracy
  3. A large diaspora of Chinese-origin scientists
  4. A centralized government willing to invest in science

A simple metric they cite to make their point is the publication trends by country. Between 2000 and 2010, for example, the number of science and engineering papers published by China has increased by 470%. The next highest climb was for India, by 234%.

Click on the image for an interactive chart.
Click on the image for an interactive chart.

“The cheaters don’t have to worry they will someday be caught and punished.”

This is a quantitative result. A common criticism of the rising volume of Chinese scientific literature in the last three decades is the quality of research coming out of it. Dramatic increases in research output are often accompanied by a publish-or-perish mindset that fosters a desperation among scientists to get published, leading to padded CVs, falsified data and plagiarism. Moreover, it’s plausible that since R&D funding in China is still controlled by a highly centralized government, flow of money is restricted and access to it is highly competitive. And when it is government officials that are evaluating science, quantitative results are favored over qualitative ones, reliance on misleading performance metrics increases, and funds are often awarded for areas of research that favor political agendas.

The PNAS paper cites the work of Shi-min Fang, a science writer who won the inaugural John Maddox prize in 2012 for exposing scientific fraud in Chinese research circles, for this. In an interview to NewScientist in November of that year, he explains the source of widespread misconduct:

It is the result of interactions between totalitarianism, the lack of freedom of speech, press and academic research, extreme capitalism that tries to commercialise everything including science and education, traditional culture, the lack of scientific spirit, the culture of saving face and so on. It’s also because there is not a credible official channel to report, investigate and punish academic misconduct. The cheaters don’t have to worry they will someday be caught and punished.

At this point, it’s tempting to draw parallels with India. While China has seen increased funding for R&D…

Click on the chart for an interactive view.
Click on the chart for an interactive view.

… India has been less fortunate.

Click on the chart for an interactive view.
Click on the chart for an interactive view.

The issue of funding is slightly different in India, in fact. While Chinese science is obstinately centralized and publicly funded, India is centralized in some parts and decentralized in others, public funding is not high enough because presumably we lack the meritocratic academic environment, and private funding is not as high as it needs to be.

Click on the image for an interactive chart.
Click on the image for an interactive chart.

Even though the PNAS paper’s authors say their breakdown of what has driven scientific output from China could inspire changes in other countries, India is faced with different issues as the charts above have shown. Indeed, the very first chart shows how, despite the number of published papers having double in the last decade, we have only jumped from one small number to another small number.

“Scientific research in India has become the handmaiden of defense technology.”

There is also a definite lack of visibility: when little scientific output of any kind is accessible to 1) the common man, and 2) the world outside. Apart from minimal media coverage, there is a paucity of scientific journals, or they exist but are not well known, accessible or both. This Jamia Milia collection lists a paltry 226 journals – including those in regional languages – but it’s likelier that there are hundreds more, both credible and dubious. A journal serves as an aggregation of reliable scientific knowledge not just for scientists but also for journalists and other reliant decision-makers. It is one place to find the latest developments.

In this context, Current Science appears to be the most favored in the country, not to mention the loneliest. Then again, a couple fingers can be pointed at years of reliance on quantitative performance metrics, which drives many Indian researchers to publish in journals with very high impact factors such as Nature or Science, which are often based outside the country.

In the absence of lists of Indian and Chinese journals, let’s turn to a table used in the PNAS paper showing average number of citations per article compared with the USA, in percent. It shows both India and China close to 40% in 2010-2011.

The poor showing may not be a direct consequence of low quality. For example, a paper may have detailed research conducted to resolve a niche issue in Indian defense technology. In such a case, the quality of the article may be high but the citability of the research itself will be low. Don’t be surprised if this is common in India given our devotion to the space and nuclear sciences. And perhaps this is what a friend of mine referred to when he said “Scientific research in India has become the handmaiden of defense technology”.

To sum up, although India and China both lag the USA and the EU for productivity and value of research (albeit through quantitative metrics), China is facing problems associated with the maturity of a voluminous scientific workforce, whereas India is quite far from that maturity. The PNAS paper is available here. If you’re interested in an analysis of engineering education in the two countries, see this paper (from which the opening lines of this post were borrowed).

Replication studies, ceiling effects, and the psychology of science

On May 25, I found Erika Salomon’s tweet:

The story started when the journal Social Psychology decided to publish successful and failed replication attempts instead of conventional papers and their conclusions for a Replications Special Issue (Volume 45, Number 3 / 2014). It accepted proposals from scientists stating which studies they wanted to try to replicate, and registered the accepted ones. This way, the journal’s editors Brian Nosek and Daniel Lakens could ensure that a study was published no matter the outcome – successful or not.

All the replication studies were direct replication studies, which means they used the same experimental procedure and statistical methods to analyze the data. And before the replication attempt began, the original data, procedure and analysis methods were scrutinized, and the data was shared with the replicating group. Moreover, an author of the original paper was invited to review the respective proposals and have a say in whether the proposal could be accepted. So much is pre-study.

Finally, the replication studies were performed, and had their results published.


The consequences of failing to replicate a study

Now comes the problem: What if the second group failed to replicate the findings of the first group? There are different ways of looking at this from here on out. The first person such a negative outcome affects is the original study’s author, whose reputation is at stake. Given the gravity of the situation, is the original author allowed to ask for a replication of the replication?

Second, during the replication study itself (and given the eventual negative outcome), how much of a role is the original author allowed to play when performing the experiment, analyzing the results and interpreting them? This could swing both ways. If the original author is allowed to be fully involved during the analysis process, there will be a conflict of interest. If the original author is not allowed to participate in the analysis, the replicating group could get biased toward a negative outcome for various reasons.

Simone Schnall, a psychology researcher from Cambridge writes on the SPSP blog (linked to in the tweet above) that, as an author of a paper whose results have been unsuccessfully replicated and reported in the Special Issue, she feels “like a criminal suspect who has no right to a defense and there is no way to win: The accusations that come with a “failed” replication can do great damage to my reputation, but if I challenge the findings I come across as a “sore loser.””

People on both sides of this issue recognize the importance of replication studies; there’s no debate there. But the presence of these issues calls into question how replication studies are designed, reviewed and published, with a just as firm support structure, or they all suffer the risk of becoming personalized. Forget who replicates the replicators, it could just as well become who bullies the bullies. And in the absence of such rules, replication studies are becoming actively disincentivized. Simone Schnall acceded to a request to replicate her study, but the fallout could set a bad example.

During her commentary, Schnall links to a short essay by Princeton University psychologist Daniel Kahneman titled ‘A New Etiquette for Replication‘. In the piece, Kahneman writes, “… tension is inevitable when the replicator does not believe the original findings and intends to show that a reported effect does not exist. The relationship between replicator and author is then, at best, politely adversarial. The relationship is also radically asymmetric: the replicator is in the offense, the author plays defense.”

In this blog post by one of the replicators, the phrase “epic fail” is an example of how things could be personalized. Note: the author of the post has struck out the words and apologized.

In order to eliminate these issues, the replicators could be asked to keep things specific. Various stakeholders have suggested different ways to resolve this issue. For one, replicators should address the questions and answers raised in the original study instead of the author and her/his credentials. Another way is to regularly publish reports of replication results instead of devoting a special issue to it, and make them part of the scientific literature.

This is one concern that Schnall raises in her answers (in response to question #13):”I doubt anybody would have widely shared the news had the replication been considered “successful.”” So there’s a need to address a bias here: are journals likelier to publish replication studies that fail to replicate previous results? Erasing this bias requires publishers to actively incentivize replication studies.

A paper published in Perspectives on Psychological Science in 2012 paints a slightly different picture. It looks at the number of replication studies published in the field and pegs the replication rate at 1.07%. Despite the low rate, one of the paper’s conclusions was that among all published replication studies, most of them reported successful, not unsuccessful, replications. It also notes that since 2000, among all replication studies published, the fraction reporting successful outcomes stands at 69.4%, and that reporting unsuccessful outcomes at 11.8%.

chart_1
Sorry about the lousy resolution. Click on the chart for a better view.

At the same time, Nosek and Lakens concede in this editorial that, “In the present scientific culture, novel and positive results are considered more publishable than replications and negative results.”


The ceiling effect

Schnall does raise many questions about the replication, including alleging the presence of a ceiling effect. As she describes it (in response to question #8):

“Imagine two people are speaking into a microphone and you can clearly understand and distinguish their voices. Now you crank up the volume to the maximum. All you hear is this high-pitched sound (“eeeeee”) and you can no longer tell whether the two people are saying the same thing or something different. Thus, in the presence of such a ceiling effect it would seem that both speakers were saying the same thing, namely “eeeeee”.

The same thing applies to the ceiling effect in the replication studies. Once a majority of the participants are giving extreme scores, all differences between two conditions are abolished. Thus, a ceiling effect means that all predicted differences will be wiped out: It will look like there is no difference between the two people (or the two experimental conditions).”

She states this as an important reason to get the replicators’ results replicated.


My opinions

// Because Schnall thinks the presence of a ceiling effect is a reason to have the replicators’ results replicated, it implies that there could be a problem with the method used to evaluate the authors’ hypothesis. Both the original and the replication studies used the same method, and the emergence of an effect in one of them but not the other implies the “fault”, if that, could lie with the replicator – for improperly performing the experiment – or with the original author – for choosing an inadequate set-up to verify the hypothesis. Therefore, one thing that Schnall felt strongly about, the scrutiny of her methods, should also have been formally outlined, i.e. a replication study is not just about the replication of results but about the replication of methods as well.

// Because both papers have passed scrutiny and have been judged worthy of publication, it makes sense to treat them as individual studies in their own right instead of one being a follow up to the other (even though technically that’s what they are), and to consider both together instead of selecting one over the other – especially in terms of the method. This sort of debate gives room for Simone Schnall to publish an official commentary in response to the replication effort and make the process inclusive. In some sense, I think this is also the sort of debate that Ivan Oransky and Adam Marcus think scientific publishing should engender.

// Daniel Lakens explains in a comment on the SPSP blog that there was peer-review of the introduction, method, and analysis plan by the original authors and not an independent group of experts. This was termed “pre-data peer review”: a review of the methods and not the numbers. It is unclear to what extent this was sufficient because it’s only with a scrutiny of the numbers does any ceiling effect become apparent. While post-publication peer-review can check for this, it’s not formalized (at least in this case) and does little to mitigate Schnall’s situation.

// Schnall’s paper was peer-reviewed. The replicators’ paper was peer-reviewed by Schnall et al. Even if both passed the same level of scrutiny, they didn’t pass the same type of it. On this basis, there might be reason for Schnall to be involved with the replication study. Ideally, however, it would have been better if the replication was better formulated, with normal peer-review, in order to eliminate Schnall’s interference. Apart from the conflict of interest that could arise, a replication study needs to be fully independent to make it credible, just like the peer-review process is trusted to be credible because it is independent. So while it is commendable that Schnall shared all the details of her study, it should have been made possible for her participation to end there.

// While I’ve disagreed with Kahneman over the previous point, I do agree with point #3 in his essay that describes the new etiquette: “The replicator is not obliged to accept the author’s suggestions [about the replicators’ M.O.], but is required to provide a full description of the final plan. The reasons for rejecting any of the author’s suggestions must be explained in detail.” [Emphasis mine]

I’m still learning about this fascinating topic, so if I’ve made mistakes in interpretations, please point them out.


Featured image: shutterstock/(c)Sunny Forest