Getting rid of the GRE

An investigation by Science has found that, today, just 3% of “PhD programs in eight disciplines at 50 top-ranked US universities” require applicants’ GRE scores, “compared with 84% four years ago”. This is good news about a test whose purpose I could never understand: first as a student who had to take it to apply to journalism programmes, then as a journalist who couldn’t unsee the barriers the test imposed on students from poorer countries with localy tailored learning systems and, yes, not fantastic English. (Before the test’s format was changed in 2011, taking the test required takers to memorise long lists of obscure English words, an exercise that was devoid of purpose because takers would never remember most of those words.) Obviously many institutes still require prospective students to take the GRE, but the fact that many others are alive to questions about the utility of standardised tests and the barriers they impose on students from different socioeconomic backgrounds is heartening. The Science article also briefly explored what proponents of the GRE have to say, and I’m sure you’ll see (below) as I did that the reasons are flimsy – either because this is the strength of the arguments on offer or because Science hasn’t sampled all the available arguments in favour, which seems to me to be more likely. This said, the reason offered by a senior member of the company that devises and administers the GRE is instructive.

“I think it’s a mistake to remove GRE altogether,” says Sang Eun Woo, a professor of psychology at Purdue University. Woo is quick to acknowledge the GRE isn’t perfect and doesn’t think test scores should be used to rank and disqualify prospective students – an approach many programs have used in the past. But she and some others think the GRE can be a useful element for holistic reviews, considered alongside qualitative elements such as recommendation letters, personal statements, and CVs. “We’re not saying that the test is the only thing that graduate programs should care about,” she says. “This is more about, why not keep the information in there because more information is better than less information, right?”

Removing test scores from consideration could also hurt students, argues Alberto Acereda, associate vice president of global higher education at the Educational Testing Service, the company that runs the GRE. “Many students from underprivileged backgrounds so often don’t have the advantage of attending prestigious programs or taking on unpaid internships, so using their GRE scores serves [as a] way to supplement their application, making them more competitive compared to their peers.”

Both arguments come across as reasonable – but they’re both undermined by the result of an exercise that the department of Earth and atmospheric sciences at Cornell University conducted in 2020: A group evaluated prospective students’ applications for MS and PhD programmes while keeping the GRE scores hidden. When the scores were revealed, the evaluations weren’t “materially affected”. Obviously the department’s findings are not generalisable – but they indicate the GRE’s redundancy, with the added benefit for evaluators to not have to consider the test’s exorbitant fee on the pool of applicants (around Rs 8,000 in 2014 and $160 internationally, up to $220 today) and the other pitfalls of using the GRE to ‘rank’ students’ suitability for a PhD programme. Some others quoted in the Science article vouched for “rubric-based holistic reviews”. The meaning of “rubric” in context isn’t clear from the article itself but the term as a whole seems to mean considering students on a variety of fronts, one of which is their performance on the GRE. This also seems reasonable, but it’s not clear what GRE brings to the table. One 2019 study found that GRE scores couldn’t usefully predict PhD outcomes in biomedical sciences. In this context, including the GRE – even as an option – in the application process could disadvantage some students from applying and/or being admitted due to the test’s requirements (including the fee) as well as, and as a counterexample to Acereda’s reasoning, due to their scores on the test not faithfully reflecting their ability to complete a biomedical research degree. But in another context – of admissions to the Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences (GSBS) – researchers reported in 2019 that the GRE might be useful to “extract meaning from quantitative metrics” and when employed as part of a “multitiered holistic” admissions process, but which by itself could disproportionately triage Black, Native and Hispanic applicants. Taken together, more information is not necessarily better than less information, especially when there are other barriers to acquiring the ‘more’ bits.

Finally, while evaluators might enjoy the marginal utility of redundancy, as a way to ‘confirm’ their decisions, it’s an additional and significant source of stress and consumer of time to all test-takers. This is in addition to a seemingly inescapable diversity-performance tradeoff, which strikes beyond the limited question of whether one standardised test is a valid predictor of students’ future performance and at the heart of what the purpose of a higher-education course is. That is, should institutes consider diversity at the expense of students’ performance? The answer depends on the way each institute is structured, what its goal is and what it measures to that end. One that is focused on its members publishing papers in ‘high IF’ journals, securing high-value research grants, developing high h-indices and maintaining the institute’s own glamourous reputation is likely to see a ‘downside’ to increasing diversity. An institute focused on engendering curiosity, adherence to critical thinking and research methods, and developing blue-sky ideas is likely to not. But while the latter sounds great (strictly in the interests of science), it may be impractical from the point of view of helping tackle society’s problems and of fostering accountability on the scientific enterprise at large. The ideal institute lies somewhere in between these extremes: its admission process will need to assume a little more work – work that the GRE currently abstracts off into a single score – in exchange for the liberty to decouple from university rankings, impact factors, ‘prestige’ and other such preoccupations.

Defending philosophy of science

From Carl Bergstrom’s Twitter thread about a new book called How Irrationality Created Modern Science, by Michael Strevens:

https://twitter.com/CT_Bergstrom/status/1372811516391526400

The Iron Rule from the book is, in Bergstrom’s retelling, “no use of philosophical reasoning in the mode of Aristotle; no leveraging theological or scriptural understanding in the mode of Descartes. Formal scientific arguments must be sterilised, to use Strevens’s word, of subjectivity and non-empirical content.” I was particularly taken by the use of the term ‘individual’ in the tweet I’ve quoted above. The point about philosophical argumentation being an “individual” technique is important, often understated.

There are some personal techniques we use to discern some truths but which we don’t publicise. But the more we read and converse with others doing the same things, the more we may find that everyone has many of the same stand-ins – tools or methods that we haven’t empirically verified to be true and/or legitimate but which we have discerned, based on our experiences, to be suitably good guiding lights.

I discovered this issue first when I read Paul Feyerabend’s Against Method many years ago, and then in practice when I found during reporting some stories that scientists in different situations often developed similar proxies for processes that couldn’t be performed in their fullest due to resource constraints. But they seldom spoke to each other (especially across institutes), thus allowing an ideal view of how to do something to crenellate even as almost every one did that something in a similarly different way.

A very common example of this is scientists evaluating papers based on the ‘prestigiousness’ and/or impact factors of the journals the papers are published in, instead of based on their contents – often simply for lack of time and proper incentives. As a result, ideas like “science is self-correcting” and “science is objective” persist as ideals because they’re products of applying the Iron Rule to the process of disseminating the products of one’s research.

But “by turning a lens on the practice of science itself,” to borrow Bergstrom’s words, philosophies of science allow us to spot deviations from the prescribed normal – originating from “Iron Rule Ecclesiastics” like Richard Dawkins – and, to me particularly, revealing how we really, actually do it and how we can become better at it. Or as Bergstrom put it: “By understanding how norms and institutions create incentives to which scientists respond …, we can find ways to nudge the current system toward greater efficiency.”

(It is also gratifying a bit to see the book as well as Bergstrom pick on Lawrence Krauss. The book goes straight into my reading list.)

The scientist as inadvertent loser

Twice this week, I’d had occasion to write about how science is an immutably human enterprise and therefore some of its loftier ideals are aspirational at best, and about how transparency is one of the chief USPs of preprint repositories and post-publication peer-review. As if on cue, I stumbled upon a strange case of extreme scientific malpractice that offered to hold up both points of view.

In an article published January 30, three editors of the Journal of Theoretical Biology (JTB) reported that one of their handling editors had engaged in the following acts:

  1. “At the first stage of the submission process, the Handling Editor on multiple occasions handled papers for which there was a potential conflict of interest. This conflict consisted of the Handling Editor handling papers of close colleagues at the Handling Editor’s own institute, which is contrary to journal policies.”
  2. “At the second stage of the submission process when reviewers are chosen, the Handling Editor on multiple occasions selected reviewers who, through our investigation, we discovered was the Handling Editor working under a pseudonym…”
  3. Many forms of reviewer coercion
  4. “In many cases, the Handling Editor was added as a co-author at the final stage of the review process, which again is contrary to journal policies.”

On the back of these acts of manipulation, this individual – whom the editors chose not to name for unknown reasons but one of whom all but identified on Twitter as a Kuo-Chen Chou (and backed up by an independent user) – proudly trumpets the following ‘achievement’ on his website:

The same webpage also declares that Chou “has published over 730 peer-reviewed scientific papers” and that “his papers have been cited more than 71,041 times”.

Without transparencya and without the right incentives, the scientific process – which I use loosely to denote all activities and decisions associated with synthesising, assimilating and organising scientific knowledge – becomes just as conducive to misconduct and unscrupulousness as any other enterprise if only because it allows people with even a little more power to exploit others’ relative powerlessness.

a. Ironically, the JTB article lies behind a paywall.

In fact, Chen had also been found guilty of similar practices when working with a different journal, called Bioinformatics, and an article its editors published last year has been cited prominently in the article by JTB’s editors.

Even if the JTB and Bioinformatics cases are exceptional for their editors having failed to weed out gross misconduct shortly after its first occurrence – it’s not; but although there many such exceptional cases, they are still likely to be in the minority (an assumption on my part) – a completely transparent review process eliminates such possibilities as well as, and more importantly, naturally renders the process trustlessb. That is, you shouldn’t have to trust a reviewer to do right by your paper; the system itself should be designed such that there is no opportunity for a reviewer to do wrong.

b. As in trustlessness, not untrustworthiness.

Second, it seems Chou accrued over 71,000 citations because the number of citations has become a proxy for research excellence irrespective of whether the underlying research is actually excellent – a product of the unavoidable growth of a system in which evaluators replaced a complex combination of factors with a single number. As a result, Chou and others like him have been able to ‘hack’ the system, so to speak, and distort the scientific literature (which you might’ve seen as the stack of journals in a library representing troves of scientific knowledge).

But as long as the science is fine, no harm done, right? Wrong.

If you visualised the various authors of research papers as points and the lines connecting them to each other as citations, an inordinate number would converge on the point of Chou – and they would be wrong, led there not by Chou’s prowess as a scientist but misled there by his abilities as a credit-thief and extortionist.

This graphing exercise isn’t simply a form of visual communication. Imagine your life as a scientist as a series of opportunities, where each opportunity is contested by multiple people and the people in charge of deciding who ‘wins’ at each stage aren’t some or all of well-trained, well-compensated or well-supported. If X ‘loses’ at one of the early stages and Y ‘wins’, Y has a commensurately greater chance of winning a subsequent contest and X, lower. Such contests often determine the level of funding, access to suitable guidance and even networking possibilities, so over multiple rounds, by virtue of the evaluators at each step having more reasons to be impressed by Y‘s CV because, say, they had more citations, and fewer reasons to be impressed with X‘s, X ends up with more reasons to exit science and switch careers.

Additionally, because of the resources that Y has received opportunities to amass, they’re in a better position to conduct even more research, ascend to even more influential positions and – if they’re so inclined – accrue even more citations through means both straightforward and dubious. To me, such prejudicial biasing resembles the evolution of a Lorenz attractor: the initial conditions might appear to be the same to some approximation, but for a single trivial choice, one scientist ends up being disproportionately more successful than another.

The answer of course is many things, including better ways to evaluate and reward research, and two of them in turn have to be to eliminate the use of numbers to denote human abilities and to make the journey of a manuscript from the lab to the wild as free of opaque, and therefore potentially arbitrary, decision-making as possible.

Featured image: A still from an animation showing the divergence of nearby trajectories on a Lorenz system. Caption and credit: MicoFilós/Wikimedia Commons, CC BY-SA 3.0.

To see faces where there are none

This week in “neither university press offices nor prestigious journals know what they’re doing”: a professor emeritus at Ohio University who claimed he had evidence of life on Mars, and whose institution’s media office crafted a press release without thinking twice to publicise his ‘findings’, and the paper that Nature Medicine published in 2002, cited 900+ times since, that has been found to contain multiple instances of image manipulation.

I’d thought the professor’s case would remain obscure because it’s evidently crackpot but this morning, articles from Space.com and Universe Today showed up on my Twitter setting the record straight: that the insects the OU entomologist had found in pictures of Mars taken by the Curiosity rover were just artefacts of his (insectile) pareidolia. Some people have called this science journalism in action but I’d say it’s somewhat offensive to check if science journalism still works by gauging its ability, and initiative, to countering conspiracy theories, the lowest of low-hanging fruit.

The press release, which has since been taken down. Credit: EurekAlert and Wayback Machine

The juicier item on our plate is the Nature Medicine paper, the problems in which research integrity super-sleuth Elisabeth Bik publicised on November 21, and which has a science journalism connection as well.

Remember the anti-preprints article Nature News published in July 2018? Its author, Tom Sheldon, a senior press manager at the Science Media Centre, London, argued that preprints “promoted confusion” and that journalists who couldn’t bank on peer-reviewed work ended up “misleading millions”. In other words, it would be better if we got rid of preprints and journalists deferred only to the authority of peer-reviewed papers curated and published by journals, like Nature. Yet here we are today, with a peer-reviewed manuscript published in Nature Medicine whose checking process couldn’t pick up on repetitive imagery. Is this just another form of pareidolia, to see a sensational result – knowing prestigious journals’ fondness for such results – where there was actually none?

(And before you say this is just one paper, read this analysis: “… data from several lines of evidence suggest that the methodological quality of scientific experiments does not increase with increasing rank of the journal. On the contrary, an accumulating body of evidence suggests the inverse: methodological quality and, consequently, reliability of published research works in several fields may be decreasing with increasing journal rank.” Or this extended critique of peer-review on Vox.)

This isn’t an argument against the usefulness, or even need for, peer-review, which remains both useful and necessary. It’s an argument against ludicrous claims that peer-review is infallible, advanced in support of the even more ludicrous argument that preprints should be eliminated to enable good journalism.

The cycle

Is it just me or does everyone see a self-fulfilling prophecy here?

https://twitter.com/nature/status/1192129029924634625

For a long time, and assisted ably by the ‘publish or perish’ paradigm, researchers sought to have their papers published in high-impact-factor journals – a.k.a. prestige journals – like Nature.

Such journals in turn, assisted ably by parasitic strategies, made these papers highly visible to other researchers around the world and, by virtue of being high-IF journals, tainted the results in the papers with a measure of prestige, ergo importance.

Evaluations and awards committees in turn were highly aware of these papers over others and picked their authors for rewards over others, further amplifying their work, increasing the opportunity cost incurred by the researchers who lose out, and increasing the prestige attached to the high-IF journals.

Run this cycle a few million times and you end up with the impression that there’s something journals like Nature get right – when in fact it’s just mostly a bunch of business practices to ensure they remain profitable.

Why are the Nobel Prizes still relevant?

Note: A condensed version of this post has been published in The Wire.

Around this time last week, the world had nine new Nobel Prize winners in the sciences (physics, chemistry and medicine), all but one of whom were white and none were women. Before the announcements began, Göran Hansson, the Swede-in-chief of these prizes, had said the selection committee has been taking steps to make the group of laureates more racially and gender-wise inclusive, but it would seem they’re incremental measures, as one editorial in the journal Nature pointed out.

Hansson and co. seems to find the argument that the Nobel Prizes award achievements at a time where there weren’t many women in science tenable when in fact it distracts from the selection committee’s bizarre oversight of such worthy names as Lise Meitner, Vera Rubin, Chien-Shiung Wu, etc. But Hansson needs to understand that the only meaningful change is change that happens right away because, even for this significant flaw that should by all means have diminished the prizes to a contest of, for and by men, the Nobel Prizes have only marginally declined in reputation.

Why do they matter when they clearly shouldn’t?

For example, according to the most common comments received in response to articles by The Wire shared on Twitter and Facebook, and always from men, the prizes reward excellence, and excellence should brook no reservation, whether by caste or gender. As is likely obvious to many readers, this view of scholastic achievement resembles a blade of grass: long, sprouting from the ground (the product of strong roots but out of sight, out of mind), rising straight up and culminating in a sharp tip.

However, achievement is more like a jungle: the scientific enterprise – encompassing research institutions, laboratories, the scientific publishing industry, administration and research funding, social security, availability of social capital, PR, discoverability and visibility, etc. – incorporates many vectors of bias, discrimination and even harassment towards its more marginalised constituents. Your success is not your success alone; and if you’re an upper-caste, upper-class, English-speaking man, you should ask yourself, as many such men have been prompted to in various walks of life, who you might have displaced.

This isn’t a witch-hunt as much as an opportunity to acknowledge how privilege works and what we can do to make scientific work more equal, equitable and just in future. But the idea that research is a jungle and research excellence is a product of the complex interactions happening among its thickets hasn’t found meaningful purchase, and many people still labour with a comically straightforward impression that science is immune to social forces. Hansson might be one of them if his interview to Nature is anything to go by, where he says:

… we have to identify the most important discoveries and award the individuals who have made them. If we go away from that, then we’ve devalued the Nobel prize, and I think that would harm everyone in the end.

In other words, the Nobel Prizes are just going to look at the world from the top, and probably from a great distance too, so the jungle has been condensed to a cluster of pin-pricks.

Another reason why the Nobel Prizes haven’t been easy to sideline is that the sciences’ ‘blade of grass’ impression is strongly historically grounded, with help from notions like scientific knowledge spreads from the Occident to the Orient.

Who’s the first person that comes to mind when I say “Nobel Prize for physics”? I bet it’s Albert Einstein. He was so great that his stature as a physicist has over the decades transcended his human identity and stamped the Nobel Prize he won in 1921 with an indelible mark of credibility. Now, to win a Nobel Prize in physics is to stand alongside Einstein himself.

This union between a prize and its laureate isn’t unique to the Nobel Prize or to Einstein. As I’ve said before, prizes are elevated by their winners. When Margaret Atwood wins the Booker Prize, it’s better for the prize than it is for her; when Isaac Asimov won a Hugo Award in 1963, near the start of his career, it was good for him, but it was good for the prize when he won it for the sixth time in 1992 (the year he died). The Nobel Prizes also accrued a substantial amount of prestige this way at a time when it wasn’t much of a problem, apart from the occasional flareup over ignoring deserving female candidates.

That their laureates have almost always been from Europe and North America further cemented the prizes’ impression that they’re the ultimate signifier of ‘having made it’, paralleling the popular undercurrent among postcolonial peoples that science is a product of the West and that they’re simply its receivers.

That said, the prize-as-proxy issue has contributed considerably as well to preserving systemic bias at the national and international levels. Winning a prize (especially a legitimate one) accords the winner’s work with a modicum of credibility and the winner, of prestige. Depending on how the winners of a prize to be awarded suitably in the future are to be selected, such credibility and prestige could be potentiated to skew the prize in favour of people who have already won other prizes.

For example, a scientist-friend ranted to me about how, at a conference he had recently attended, another scientist on stage had introduced himself to his audience by mentioning the impact factors of the journals he’d had his papers published in. The impact factor deserves to die because, among other reasons, it attempts to condense multi-dimensional research efforts and the vagaries of scientific publishing into a single number that stands for some kind of prestige. But its users should be honest about its actual purpose: it was designed so evaluators could take one look at it and decide what to do about a candidate to whom it corresponded. This isn’t fair – but expeditiousness isn’t cheap.

And when evaluators at different rungs of the career advancement privilege the impact factor, scientists with more papers published earlier in their careers in journals with higher impact factors become exponentially likelier to be recognised for their efforts (probably even irrespective of their quality given the unique failings of high-IF journals, discussed here and here) over time than others.

Brian Skinner, a physicist at Ohio State University, recently presented a mathematical model of this ‘prestige bias’ and whose amplification depended in a unique way, according him, on a factor he called the ‘examination precision’. He found that the more ambiguously defined the barrier to advancement is, the more pronounced the prestige bias could get. Put another way, people who have the opportunity to maintain systemic discrimination simultaneously have an incentive to make the points of entry into their club as vague as possible. Sound familiar?

One might argue that the Nobel Prizes are awarded to people at the end of their careers – the average age of a physics laureate is in the late 50s; John Goodenough won the chemistry prize this year at 97 – so the prizes couldn’t possibly increase the likelihood of a future recognition. But the sword cuts both ways: the Nobel Prizes are likelier than not to be the products a prestige bias amplification themselves, and are therefore not the morally neutral symbols of excellence Hansson and his peers seem to think they are.

Fourth, the Nobel Prizes are an occasion to speak of science. This implies that those who would deride the prizes but at the same time hold them up are equally to blame, but I would agree only in part. This exhortation to try harder is voiced more often than not by those working in the West, with publications with better resources and typically higher purchasing power. On principle I can’t deride the decisions reporters and editors make in the process of building an audience for science journalism, with the hope that it will be profitable someday, all in a resource-constrained environment, even if some of those choices might seem irrational.

(The story of Brian Keating, an astrophysicist, could be illuminating at this juncture.)

More than anything else, what science journalism needs to succeed is a commonplace acknowledgement that science news is important – whether it’s for the better or the worse is secondary – and the Nobel Prizes do a fantastic job of getting the people’s attention towards scientific ideas and endeavours. If anything, journalists should seize the opportunity in October every year to also speak about how the prizes are flawed and present their readers with a fuller picture.

Finally, and of course, we have capitalism itself – implicated in the quantum of prize money accompanying each Nobel Prize (9 million Swedish kronor, Rs 6.56 crore or $0.9 million).

Then again, this figure pales in comparison to the amounts that academic institutions know they can rake in by instrumentalising the prestige in the form of donations from billionaires, grants and fellowships from the government, fees from students presented with the tantalising proximity to a Nobel laureate, and in the form of press coverage. L’affaire Epstein even demonstrated how it’s possible to launder a soiled reputation by investing in scientific research because institutions won’t ask too many questions about who’s funding them.

The Nobel Prizes are money magnets, and this is also why winning a Nobel Prize is like winning an Academy Award: you don’t get on stage without some lobbying. Each blade of grass has to mobilise its own PR machine, supported in all likelihood by the same institute that submitted their candidature to the laureates selection committee. The Nature editorial called this out thus:

As a small test case, Nature approached three of the world’s largest international scientific networks that include academies of science in developing countries. They are the International Science Council, the World Academy of Sciences and the InterAcademy Partnership. Each was asked if they had been approached by the Nobel awarding bodies to recommend nominees for science Nobels. All three said no.

I believe those arguments that serve to uphold the Nobel Prizes’ relevance must take recourse through at least one of these reasons, if not all of them. It’s also abundantly clear that the Nobel Prizes are important not because they present a fair or useful picture of scientific excellence but in spite of it.

The case for preprints

Daniel Mansur, the principal investigator of a lab at the Universidade Federal de Santa Catarina that studies how cells respond to viruses, had this to say about why preprints are useful in an interview to eLife:

Let’s say the paper that we put in a preprint is competing with someone and we actually have the same story, the same set of data. In a journal, the editors might ask both groups for exactly the same sets of extra experiments. But then, the other group that’s competing with me works at Stanford or somewhere like that. They’ll order everything they need to do the experiments, and the next day three postdocs will be working on the project. If there’s something that I don’t have in the lab, I have to wait six months before starting the extra experiments. At least with a preprint the work might not be complete, but people will know what we did.

Preprints level the playing field by eliminating one’s “ability to publish” in high-IF journals as a meaningful measure of the quality of one’s work.

While this makes it easier for scientists to compete with their better-funded peers, my indefatigable cynicism suggests there must be someone out there who’s unhappy about this. Two kinds of people come immediately to mind: journal publishers and some scientists at highfalutin universities like Stanford.

Titles like NatureCellNew England Journal of Medicine and Science, and especially those published by the Elsevier group, have ridden the impact factor (IF) wave to great profit through many decades. In fact, IF continues to be the dominant mode of evaluation of research quality because it’s easy and not time-consuming, so – given how IF is defined – these journals continue to be important for being important. They also provide a valuable service – the double-blind peer review, which Mansur thinks is the only thing preprints are currently lacking in. But other than that (and with post-publication peer-review being largely suitable), their time of obscene profits is surely running out.

The pro-preprint trend in scientific publishing is also bound to have jolted some scientists whose work received a leg-up by virtue of their membership in elite faculty groups. Like Mansur says, a scientist from Stanford or a similar institution can no longer claim primacy, or uniqueness, by default. As a result, preprints definitely improve the forecast for good scientists working at less-regarded institutions – but an equally important consideration would be whether preprints also diminish the lure of fancy universities. They do have one less thing to offer now, or at least in the future.

Priggish NEJM editorial on data-sharing misses the point it almost made

The editorial expresses fear that people who publish in the journal’s pages could be wrong – cleanly forgetting that replication and revalidation are a big part of science.

Twitter outraged like only Twitter could on January 22 over a strange editorial that appeared in the prestigious New England Journal of Medicine, calling for medical researchers to not make their research data public. The call comes at a time when the scientific publishing zeitgeist is slowly but surely shifting toward journals requiring, sometimes mandating, the authors of studies to make their data freely available so that their work can be validated by other researchers.

Through the editorial, written by Dan Longo and Jeffrey Drazen, both doctors and the latter the chief editor, NEJM also cautions medical researchers to be on the lookout for ‘research parasites’, a coinage that the journal says is befitting “of people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited”. As @omgItsEnRIz tweeted, do the authors even science?

https://twitter.com/martibartfast/status/690503478813261824

The choice of words is more incriminating than the overall tone of the text, which also tries to express the more legitimate concern of replicators not getting along with the original performers. However, by saying that the ‘parasites’ may “use the data to try to disprove what the original investigators had posited”, NEJM has crawled into an unwise hole of infallibility of its own making.

In October 2015, a paper published in the Journal of Experimental Psychology pointed out why replication studies are probably more necessary than ever. The misguided publish-or-perish impetus of scientific research, together with publishing in high impact-factor journals being lazily used as a proxy for ‘good research’ by many institutions, has led researchers to hack their results – i.e. prime them (say, by cherry-picking) so that the study ends up reporting sensational results when, really, duller ones exist.

The JEP paper had a funnel plot to demonstrate this. Quoting from the Neuroskeptic blog, which highlighted the plot when the paper was published, “This is a funnel plot, a two-dimensional scatter plot in which each point represents one previously published study. The graph plots the effect size reported by each study against the standard error of the effect size – essentially, the precision of the results, which is mostly determined by the sample size.” Note: the y-axis is running top-down.

funnel_shanks1

The paper concerned itself with 43 previously published studies discussing how people’s choices were perceived to change when they were gently reminded about sex.

As Neuroskeptic goes on to explain, there are three giveaways in this plot. One is obvious – that the distribution of replication studies is markedly separated from that of the original studies. Second: the least precise results from the original studies worked with the larger sample sizes. Third: the original studies all seemed to “hug” the outer edge of the grey triangles, which represents a statistical measure responsible for indicating if some results are reliable. The uniform ‘hugging’ is an indication that all those original studies were likely guilty of cherry-picking from their data to conclude with results that are just about reliable, an act called ‘p-hacking’.

A line of research can appear to progress rapidly but without replication studies it’s difficult to establish if the progress is meaningful for science – a notion famously highlighted by John Ioannidis, a professor of medicine and statistics at Stanford University, in his two landmark papers in 2005 and 2014. Björn Brembs, a professor of neurogenetics at the Universität Regensburg, Bavaria, also pointed out how the top journals’ insistence on sensational results could result in a congregation of unreliability. Together with a conspicuous dearth of systematically conducted replication studies, this ironically implies that the least reliable results are often taken the most seriously thanks to the journals they appear in.

The most accessible sign of this is a plot between the retraction index and the impact factor of journals. The term ‘retraction index’ was coined in the same paper in which the plot first appeared; it stands for “the number of retractions in the time interval from 2001 to 2010, multiplied by 1,000, and divided by the number of published articles with abstracts”.

Impact factor of journals plotted against the retraction index. The highest IF journals – Nature, Cell and Science – are farther along the trend line than they should be. Source: doi: 10.1128/IAI.05661-11
Impact factor of journals plotted against the retraction index. The highest IF journals – Nature, Cell and Science – are farther along the trend line than they should be. Source: doi: 10.1128/IAI.05661-11

Look where NEJM is. Enough said.

The journal’s first such supplication appeared in 1997, then writing against pre-print copies of medical research papers becoming available and easily accessible – á la the arXiv server for physics. Then, the authors, again two doctors, wrote, “medicine is not physics: the wide circulation of unedited preprints in physics is unlikely to have an immediate effect on the public’s well-being even if the material is biased or false. In medicine, such a practice could have unintended consequences that we all would regret.” Though a reasonable PoV, the overall tone appeared to stand against the principles of open science.

More importantly, both editorials, separated by almost two decades, make one reasonable argument that sadly appears to make sense to the journal only in the context of a wider set of arguments, many of them contemptible. For example, Drazen seems to understand the importance of data being available for studies to be validated but has differing views on different kinds of data. Two days before his editorial was published, another appeared co-authored by 16 medical researchers – Drazen one of them – in the same journal, this time calling for anonymised patient data from clinical trials being made available to other researchers because it would “increase confidence and trust in the conclusions drawn from clinical trials. It will enable the independent confirmation of results, an essential tenet of the scientific process.”

(At the same time, the editorial also says, “Those using data collected by others should seek collaboration with those who collected the data.”)

For another example, NEJM labours under the impression that the data generated by medical experiments will not ever be perfectly communicable to other researchers who were not involved in the generation of it. One reason it provides is that discrepancies in the data between the original group and a new group could arise because of subtle choices made by the former in the selection of parameters to evaluate. However, the solution doesn’t lie in the data being opaque altogether.

A better way to conduct replication studies

An instructive example played out in May 2014, when the journal Social Psychology published a special issue dedicated to replication studies. The issue contained both successful and failed attempts at replicating some previously published results, and the whole process was designed to eliminate biases as much as possible. For example, the journal’s editors Brian Nosek and Daniel Lakens didn’t curate replication studies but instead registered the studies before they were performed so that their outcomes would be published irrespective of whether they turned out positive or negative. For another, all the replications used the same experimental and statistical techniques as in the original study.

One scientist who came out feeling wronged by the special issue was Simone Schnall, the director of the Embodied Cognition and Emotion Laboratory at Cambridge University. The results of a paper co-authored by Schnall in 2008 hadfailed to be replicated, but she believed there had been a mistake in the replication that, when corrected, would corroborate her group’s findings. However, her statements were quickly and widely interpreted to mean she was being a “sore loser”. In one blog, her 2008 findings were called an “epic fail” (though the words were later struck out).

This was soon followed a rebuttal by Schnall, followed by a counter by the replicators, and then Schnall writing two blog posts (here and here). Over time, the core issue became how replication studies were conducted – who performed the peer review, the level of independence the replicators had, the level of access the original group had, and how journals could be divorced from having a choice about which replication studies to publish. But relevant to the NEJM context, the important thing was the level of transparency maintained by Schnall & co. as well as the replicators, which provided a sheen of honesty and legitimacy to the debate.

The Social Psychology issue was able to take the conversation forward, getting authors to talk about the psychology of research reporting. There have been few other such instances – of incidents exploring the proper mechanisms of replication studies – so if the NEJM editorial had stopped itself with calling for better organised collaborations between a study’s original performers and its replicators, it would’ve been great. As Longo and Drazen concluded, “How would data sharing work best? We think it should happen symbiotically … Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested.”

https://twitter.com/significantcont/status/690507462848450560

The mistake lies in thinking anything else would be parasitic. And the attitude affects not just other scientists but some science communicators as well. Any journalist or blogger who has been reporting on a particular beat for a while stands to become a ‘temporary expert‘ on the technical contents of that beat. And with exploratory/analytical tools like R – which is easier than you think to pick up – the communicator could dig deeper into the data, teasing out issues more relevant to their readers than what the accompanying paper thinks is the highlight. Sure, NEJM remains apprehensive about how medical results could be misinterpreted to terrible consequence. But the solution there would be for the communicators to be more professional and disciplined, not for the journal to be more opaque.

The Wire
January 24, 2016