Analysis Tech

The overlay bias

I’m not very fond of some highly popular pieces of writing (I won’t name them because I’m nervous about backlash from authors and/or their supporters) because a part of their popularity is undeniably rooted in technological ‘solutions’ that asymmetrically promote work published in the solution’s country of origin.

My favourite example is Pocket, the app that allows users to save copies of articles to read later, offline if required. Not long ago, Pocket introduced an extension for the Google Chrome browser (which counts hundreds of millions of users) such that every time you opened a new tab, it would show you three articles lots of other Pocket users have read and liked. It’s fairly brainless, ergo presumably non-malicious, and you’d expect the results to be distributed equally from among magazines, journals, etc. published around the world.

However, nine times out of ten – but often more – I’d find articles by NYT, The Atlantic, The Baffler, etc. there. I was reluctant to blame Pocket at first, considering their algorithm seemed too simple, but then I realised Pocket was just the last in a long line of other apps and algorithms that simply amplified existing biases.

Before Pocket, for example, there might have been Twitter, Facebook or some other platform that allowed stories from some domains (,, etc.) to persist for longer on users’ feeds because they were more easily perceived to be legitimate than articles from other sources, say, a Venezuelan newspaper, a Kenyan blog, a Pakistani magazine or a Vietnamese journal. Or there might have been Nuzzle, which auto-compiles a digest of articles that others your friends on the social media have shared most – likely unmindful of the fact that people quite often share headlines, or domains they’d like to be known to be reading, instead of the articles themselves.

This is a social magnification like the biological magnification in nature, whereby toxic substances pile up in greater quantities in the gizzards of animals higher up in the food chain. Here, perceptions of legitimacy and quality accumulate in greater quantities in the feeds and timelines of people who consume, or even glance through, the most information. And this way, a general consciousness of what’s considered desirable erects itself without anything drastic, with just the more fleeting and mindless actions of millions of people, into a giant wheel of information distribution that constantly feeds itself its own momentum.

As the wheel turns, and The Atlantic publishes an article, it doesn’t just publish a good article that draws hundreds of thousands of readers. It also rides a wheel set in motion by American readers, American companies, American developers, American interests and American dollars, with a dollop of historical imperialism, that quietly but surely brings the world a good article plus a good-natured reminder that The Atlantic is good and that readers needn’t go looking for anything else because The Atlantic has them covered.

As I wondered in 2017, and still do: “Will my peers in India have been farther along in their careers had there been an equally influential Indian for-publishers tech stack?” Then again, how much is one more amplifier, Pocket or anything else, going to change?

I went into this tirade because of this Twitter thread, which describes a similar issue with arXiv – the popular preprint repo for physical sciences, computer science and applied mathematics papers (don’t @ me to quibble over arXiv’s actual remit). As the tweeter Jia-Bin Huang writes, the manuscripts that were uploaded last – i.e. most recently – to arXiv are displayed on top of the output stack, and what’s displayed on top of the stack gets more citations and readership.

This is a very simple algorithm, quite like Pocket’s algorithm, but in both cases they’re algorithms overlaid on existing bias-amplifying architectures. In a sense, they’re akin to the people who might stand by and watch a lynching, neither egging the perpetrators on nor stopping them. If the metaphor is brutal, remember that the effects on any publication or scientist that can’t infiltrate or ‘hack’ social biases are brutal as well. While their contents and their ideas might deserve international readership, these publications and scientists will need to spend more – energy, resources, effort – to grab international attention again and again.

The example Jia-Bin Huang cites is of scientists in Asia, who – unlike their American counterparts – can’t upload a paper on arXiv just before the deadline so that their papers sit on top of the stack because 2 pm in New York is 3 am in Taipei.

As some replies to the thread indicated, the people maintaining arXiv can easily solve the problem by waiting for the deadline to pass, then randomising the order of papers displayed in its email blast – but as Jia-Bin Huang notes, doing that would mean negating the just-in-time advantage that arXiv’s American users enjoy. So here we are.

It isn’t hard to see how we can extend the same suggestion to the world’s Pockets and Nuzzles. Pick your millions of users’ thousand most-read articles, mix up their order – even weigh down popular American publishers if necessary – and finally advertise the first ten items from this list. But ultimately, until technological solutions actively negate the biases they overlie, Pocket will lie on the same spectrum as the tools that produce the biases. I admit fact-checking in this paradigm could be labour-intensive, as could relevance-checking vis-à-vis arXiv, but I also think the latter would be better problems to solve.


To see faces where there are none

This week in “neither university press offices nor prestigious journals know what they’re doing”: a professor emeritus at Ohio University who claimed he had evidence of life on Mars, and whose institution’s media office crafted a press release without thinking twice to publicise his ‘findings’, and the paper that Nature Medicine published in 2002, cited 900+ times since, that has been found to contain multiple instances of image manipulation.

I’d thought the professor’s case would remain obscure because it’s evidently crackpot but this morning, articles from and Universe Today showed up on my Twitter setting the record straight: that the insects the OU entomologist had found in pictures of Mars taken by the Curiosity rover were just artefacts of his (insectile) pareidolia. Some people have called this science journalism in action but I’d say it’s somewhat offensive to check if science journalism still works by gauging its ability, and initiative, to countering conspiracy theories, the lowest of low-hanging fruit.

The press release, which has since been taken down. Credit: EurekAlert and Wayback Machine

The juicier item on our plate is the Nature Medicine paper, the problems in which research integrity super-sleuth Elisabeth Bik publicised on November 21, and which has a science journalism connection as well.

Remember the anti-preprints article Nature News published in July 2018? Its author, Tom Sheldon, a senior press manager at the Science Media Centre, London, argued that preprints “promoted confusion” and that journalists who couldn’t bank on peer-reviewed work ended up “misleading millions”. In other words, it would be better if we got rid of preprints and journalists deferred only to the authority of peer-reviewed papers curated and published by journals, like Nature. Yet here we are today, with a peer-reviewed manuscript published in Nature Medicine whose checking process couldn’t pick up on repetitive imagery. Is this just another form of pareidolia, to see a sensational result – knowing prestigious journals’ fondness for such results – where there was actually none?

(And before you say this is just one paper, read this analysis: “… data from several lines of evidence suggest that the methodological quality of scientific experiments does not increase with increasing rank of the journal. On the contrary, an accumulating body of evidence suggests the inverse: methodological quality and, consequently, reliability of published research works in several fields may be decreasing with increasing journal rank.” Or this extended critique of peer-review on Vox.)

This isn’t an argument against the usefulness, or even need for, peer-review, which remains both useful and necessary. It’s an argument against ludicrous claims that peer-review is infallible, advanced in support of the even more ludicrous argument that preprints should be eliminated to enable good journalism.

Op-eds Scicomm

Preference for OA research by income group

Two researchers from Rwanda performed a “systematic computational analysis of the biomedical literature” and concluded in their paper that:

… papers with authors based in sub-Saharan Africa, papers with authors based in low income countries, and papers resulting from international collaboration are all much more likely to be made openly accessible than papers that don’t have these properties.

They analysed 547,404 papers indexed in PubMed, which is:

… a free resource developed and maintained by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM). PubMed PubMed provides free access to MEDLINE, NLM’s database of citations and abstracts in the fields of medicine, nursing, dentistry, veterinary medicine, health care systems, and preclinical sciences.


The researchers also found that after scientists from low-income countries, those in high-income countries exhibited the next highest preference for publishing in open-access (OA) journals and that scientists from lower and upper middle-income countries – such as India – came last. It is important to acknowledge here that while there exists a marked (inverse) correlation between GDP per capita and number of publications in OA journals, a causation might be harder to pin down because GDP figures are influenced by a large array of factors.

At the same time, given the strength of the correlation, their conclusion – about scientists from middle-income countries being associated with the fewest OA papers in their sample – seems curious. The article processing charge (APC) levied by some journals to make a paper openly accessible immediately after publishing is only marginally more affordable in middle-income countries than it is in low-income countries. However, the effects of technology and initiative seem to allay some of this confusion.

There are two popular ways, or routes, to publish OA papers. In the ‘gold’ route, the authors of a paper pay the APC to the journal, which in turn makes the paper openly accessible once it is published. A common example is PLOS One, whose APC is at the lower end, $1,595 (Rs 1.13 lakh). On the other hand Nature Communications charges a stunning EUR 4,290 (Rs 3.4 lakh) per paper for submissions from India. In the ‘green route’, the authors or publishers upload the paper to a publicly accessible repository apart from formally publishing it; common example: the arXiv preprints server, which is moderated by volunteers.

There is also ‘hybrid’ OA, whereby a part of the journal’s contents are openly available and the rest is behind a paywall. In one review published in February 2018, researchers also pointed out a ‘bronze’ route: “articles made free-to-read on the publisher website” but “without an explicit [OA] license”.

The authors of the current paper reason that researchers from high-income countries might be ranking higher in their preference for OA papers because the “‘green’ route of OA has been encouraged by an enormous growth in the number of OA repositories, particularly in Europe and North America”; they also note that Africa was home to only 4% of such repositories in 2018. In the same vein, they continue, “the vast majority of funding organizations with OA policies as of 2018 were based in Europe and North America, with less than 3% of total OA policies originating from organizations based in Africa”.

Additionally, many journals frequently waive APCs for submissions from authors in low-income countries, whereas those from lower- and upper-middle income countries – again, including India – do not qualify as frequently to have their papers published without a fee. A very conservative, back-of-the-envelope estimate suggests India spends at least Rs 600 crore every year as APCs.

It was to reduce this burden that K. VijayRaghavan, the principal scientific adviser to the Government of India, announced earlier this year that India was joining the Plan S coalition of research-funders, which aims to have all research funded by them openly accessible to the public by 2021. As a result, researchers funded by Plan S members will have to submit to journals that offer gold/green routes and/or journals will have to make exceptions for publishing research funded by Plan S members.

This is going to take a bit of hammering out because the Plan S concept has many problems. Perhaps the most frustrating among them is its Eurocentric priorities. Other commentators have acknowledged that this limits Plan S’s ability to serve meaningfully the interests of researchers from South/Southeast Asia, Africa and Latin America. In July, two Argentinian researchers lambasted just this aspect and accused Plan S of ignoring “the reality of Latin America”. They wrote that Plan S views “scientific publishing and scholarly publications … as a commodity prone to commercialization” whereas in Latin America, they “are conceived as the community sharing of public goods”.

The latter is more in line with the interests of the developing world as well as with the spirit of knowledge-sharing more generally. At present, a little over 50% of research articles are not openly accessible, although this is changing thanks to the increasing recognition of OA’s merits, including the debatable citation advantage. Research-funders devised Plan S to “accelerate this transition”, as Jon Tennant wrote, but its implementation guidelines need tweaking.

Another problem with Plan S is that it keeps the focus on the ‘gold’ OA route and does little to address many researchers’ bias against less prestigious, but no less credible, journals. For example, while Plan S specifies that it will have gold-OA journals cap their APCs, scientists have said that this would be unenforceable. So, as I wrote in February:

… if Plan S has to work, researcher-funders also have to help reform scientists’ and administrators’ attitude towards notions like prestige. A top-down mandate to publish only in certain journals won’t work if the institutions aren’t equipped, for example, to evaluate research based on factors other than ‘prestige’.

To this end, the study by the researchers in Rwanda offers a useful suggestion: that the presence or absence of policies might not be the real problem.

There was no clear relationship between the number of open access policies in a region and the percentage of open access publications in that region. … The finding that open access publication rates are highest in sub-Saharan Africa and low income countries suggests that factors other than open access policy strongly influence authors’ decisions to make their work openly accessible.


The DNA-based computer that can calculate π

I’m not fond of biology. Of late, however, it’s been harder to avoid encountering it because the frontiers of many fields of research are becoming increasingly multidisciplinary. Biological processes are meshing with physics and statistics, and undergoing the kind of epistemic reimagination that geometry experienced in the 19th and 20th centuries. Now, scientists are able to manipulate biology to do wondrous things.

Consider the work of a team from the Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India, which has figured out a way to compute the value of π using self-assembling strands of DNA. Their work derives from previous successful attempts to perform simple mathematical calculations by nudging these molecules to bind to each other in specific ways, a technique called tile assembly.

It was first formulated as a tiling problem by Chinese philosopher Hao Wang in 1961. Wang wanted to know if a set of square tiles could cover a plane in a periodic pattern if each tile had four different colored edges and only edges of the same color could abut each other. The answer was that they could cover a plane but only with an aperiodic pattern.

In a DNA tile assembly model (TAM), each tile represents a section of the DNA molecule, called a monomer. When adjacent tiles’ abutting sides line up with the same color, then the two monomers attach themselves across the abutting sides according to a strength corresponding to that color. This way, given a tile to start with – called the seed tile – and a sequence of tiles coming up next, the DNA monomers can link up to form diverse patterns.

By controlling the sequence of colors and their strengths, scientists can thus use TAM to control the values of variables moving through the resultant grid. Connections of monomers between tiles can be made become stronger or weaker, and to different extents, in ways mimicking how the voltage between different electronic components in a computer’s circuit allow it to perform mathematical calculations.

So, Shalin Shah, Parth Dave and Manish Gupta from the Institute used four new variations of TAM that they’d developed to calculate the value of π. Each of these variations performs a specific function, much like the logic gates inside an information processor.

  1. The compare tile system decides which number is greater between two numbers, or if they’re equal
  2. The shift tile system shifts the bits of a number by one bit to the right, and adds a 0 to the leftmost bit. For example, 11001 becomes 01100.
  3. The subtract and shift tile system subtracts one binary number from the other, then right-shifts its bits by one bit to the right, and finally adds a padding 0 to the leftmost bit
  4. The insert bit tile system inserts a bit in a number

Using a combination of these systems – all with the TAM at their hearts – the trio has been able to compute the value of π like below:

The gray tiles are input tiles, green are addition/subtraction tiles, yellow are copy/duplicate tiles, orange tiles are shift tiles, and blue tiles indicate the remainders of the corresponding division process. Image: Computing Real Numbers using DNA Self-Assembly, Shah et al, Laboratory of Natural Information Processing, DAIICT.
The gray tiles are input tiles, green are addition/subtraction tiles, yellow are copy/duplicate tiles, orange tiles are shift tiles, and blue tiles indicate the remainders of the corresponding division process. The calculation is growing upward and toward the right. Image: Computing Real Numbers using DNA Self-Assembly, Shah et al, Laboratory of Natural Information Processing, DAIICT.

You can see that the calculation is an ongoing infinite series – specifically, the Leibniz series, which estimates π as an infinitely alternating sequence of additions and subtractions between smaller and smaller fractions. Because it is infinite, the trio’s calculator’s ability to find a more precise value of π depends only on how many tiles are available. Second, because the calculator can compute infinite series, any number or problem that can be reduced to the solution of an infinite series is now solvable using this calculator.

This would merely be a curious yet tedious way to calculate if not for its potential to exploit the biological properties of DNA to enhance the calculator’s abilities. Although this hasn’t been elaborately outlined in the trio’s pre-print paper on arXiv, it is plausible that such calculators could be used to guide the development of complex and evermore intricate DNA structures with minimal human intervention, or to fashion molecular logic circuits commoving microscopic robots delivering drugs within our bloodstreams. Studies in the past have already shown that DNA self-assembly is Turing-universal, which means it can perform any calculation that is known to be calculable.

The DNA molecule is itself a wondrous device, existing in nature to store genetic data over tens of thousands of years only for a future inheritor to slowly retrieve information essential for its survival. Scientists have found the molecule can hold 5.5 petabits of data per cubic millimeter, without letting any of it become corrupted for 1 million years if stored at -18 degrees Celsius.