Berkman Center

From Orwell to Kafka, Markov to Doctorow: Understanding Big Data through metaphors

Vasudevan Mukunth

29 Mar 2015 — 7 min read

On March 20, I attended a short talk by Malavika Jayaram, a fellow at the Berkman Center for Internet & Society, titled ‘What we talk about when we talk about Big Data’ at the T.A.J. Residency in Bengaluru. It was something of an initiation into the social and political contexts of Big Data and its usage, and the important ethical conundrums assailing these contexts.

Even if it was a little slow during the first 15 minutes, Jayaram’s talk progressed rapidly later on as she quickly piled criticism after criticism upon the concept’s foundation, which was quickly being revealed to be immature. Perhaps those familiar with Jayaram’s past research did (or didn’t) find the contents of her talk to contain more nuances than she’s let on before, but to me it revealed an array of perspectives I’ve remained balefully ignorant of.

The first in line was about the metaphors used to describe Big Data – and how our use of metaphors at all betrays our inability to comprehend Big Data in its entirety. Jayaram quoted at length but loosely from an essay by Sara M. Watson, her colleague at Berkman, titled Data is the new “____”. It describes how the dominant metaphors are industrial, dealing with the data itself as if it were a natural resource and the process of analyzing it as if it were being mined or refined.

Data as a natural resource suggests that it has great value to be mined and refined but that it must be handled by experts and large-scale industrial processes. Data as a byproduct describes the transactional traces of digital interactions but suggests it is also wasteful, pollutive, and may not be meaningful without processing. Data has also been described as a fungible resource, as an asset class, suggesting that it can be traded, stored, and protected in a data vault. One programmatic advertising professional related to me that he thinks “data is the steel of the digital economy,” an image that avoids the negative connotations of oil while at the same time expressing concern about monopolizing forces of firms Google and Facebook.

Not Orwellian but Kafkaesque

There are two casualties of this perspective. The first is the people behind the data – those whose features, actions, choices, etc. have become numbers – are forgotten even as the data they have given “birth” to becomes more important and valuable. The second casualty is the constant reminder that data is valuable, and large amounts of data more so, condemning it to a life where it can’t hope to be stagnant for long.

The dehumanization of Big Data, according to Jayaram, extends beyond analysts forgetting the data belongs to faces and names and unto the restriction of personal ownership. The people the data represents often don’t have access to it. This implies an existential anxiety quite unlike found in George Orwell’s 1984 and more like the one in Franz Kafka’s The Trial. In Jayaram’s words,

You are in prison awaiting your trial. Suddenly you find out the trial has been postponed and you have no idea why or how. There seem to be people who know things that you never will. You don’t know what you can do to encourage their decisions to keep the trial permanently postponed. You don’t know what it was about you and you have no way of changing your behavior accordingly.

In 2013, American attorney John Whitehead popularized this comparison in an article titled Kafka’s America. Whitehead argues that the sentiments of Josef K., the protagonist of The Trial, are increasingly becoming the sentiments of a common American.

Josef K’s plight, one of bureaucratic lunacy and an inability to discover the identity of his accusers, is increasingly an American reality. We now live in a society in which a person can be accused of any number of crimes without knowing what exactly he has done. He might be apprehended in the middle of the night by a roving band of SWAT police. He might find himself on a no-fly list, unable to travel for reasons undisclosed. He might have his phones or internet tapped based upon a secret order handed down by a secret court, with no recourse to discover why he was targeted. Indeed, this is Kafka’s nightmare, and it is slowly becoming America’s reality.

Kafka-biographer Reiner Stach summed up these activities as well as the steadily unraveling realism of Kafka’s book as proof of “the extent to which power relies on the complicity of its victims” – and the ‘evil’ mechanism used to achieve this state is a concern that Jayaram places among the prime contemporary problems threatening civil liberties.

If your hard drive’s not in space…

There is an added complication. If the use of Big Data was predominantly suspect, it would have been easier to build consensus against its abuse. However, that isn’t the case: Big Data is more often than not used in ways that don’t harm our personal liberties, and the misfortune is that their collective beneficence as yet has been no match for the collective harm some of its misuses have achieved. Could this be because the potential for its misuse is almost everywhere?

Yes. An often overlooked facet of using Big Data is the idea that the responsible use of Big Data is not a black-and-white deal. Facebook is not all evil and academic ethnographers are not all benign. Zuckerberg’s social network may collect and store large amounts of information that it nefariously trades with advertisers – and may even comply with the NSA’s “requests” – but there is a systematicity, an orderliness, with which the data is being passed around. The complex’s existence alone presents a problem, no doubt, but that there is a complex at all makes it easier to attempt to fix the problem than if the orderliness were absent.

And this orderliness is often absent among academicians, scholars, journalists, etc., who may not think data is a dollar note but at the same time are processing prodigious amounts of it without being as careful as is necessary about how they are logging, storing and sharing it. Jayaram rightly believes that even if information is collected for benevolent purposes, the moment it becomes data it loses its memory and stays on on the Internet as data; that if we are to be responsible data-scientists, being benevolent alone will be inadequate.

To drive the point home, she recalled a comment someone had made to her during a data workshop.

The Utopian way to secure data is to shoot your hard drive into space.

Every other recourse will only fall short.

Consent is not enough

This memoryless, Markovian character of the data-economy demands a redefinition of consent as well. The question “What is consent?” is dependent on what a person is consenting to. However, almost nobody knows how the data will be used, what for, or over what time-frames. Like a variable flowing through different parts of a computer, data can pass through a variety of contexts to each of which it provides value of varying quality. So, the same question of contextual integrity should retrospectively apply to the process of consent-giving as well: What are we consenting to when we’re consenting to something?

And when both the party asking for consent and the party asked for consent can’t know all the ways in which the data will be used, the typical way-out has been to seek consent that protects one against harm – either by ensuring that one’s civil liberties are safeguarded or by explicitly prohibiting choices that will impinge upon, again, one’s civil liberties. This has also been increasingly done in a one-size-fits-all manner that the average citizen doesn’t have the bargaining power to modify.

However, it’s become obvious by now that just protecting these liberties isn’t enough to ensure that data and consent are both promised a contextual integrity.

Why not? Because the statutes that enshrine many of these liberties is yet to be refashioned for the Internet age. In India, at least, the six fundamental rights are to equality, to freedom, against exploitation, to freedom of religion, cultural and educational rights, and to constitutional remedies. Between them, the promise of protecting against the misuse of not one’s person but one’s data is tenuous (although a recent document from the Telecom Regulatory Authority of India could soon fix this).

The Little Brothers

Anyway, an immediate consequence of this typical way-out has been that one needs to be harmed to get remedy, at a time when it remains difficult to define when one’s privacy has been harmed. And since privacy has been an enabler of human rights, even unobtrusive acts of tagging and monitoring that don’t violate the law can force compliance among the people. This is what hacker Andrew Huang talks about in his afterword to Cory Doctorow’s novel Little Brother (2008),

[In] January 2007, … Boston police found suspected explosive devices and shut down the city for a day. These devices turned out to be nothing more than circuit boards with flashing LEDs, promoting a show for the Cartoon Network. The artists who placed this urban graffiti were taken in as suspected terrorists and ultimately charged with felony; the network producers had to shell out a $2 million settlement, and the head of the Cartoon Network resigned over the fallout.

Huang’s example further weakens the Big Brother metaphor by implicating not one malevolent central authority but an epidemic, Kafkaesque paranoia that has “empowered” a multitude of Little Brothers all convinced that God is only in the detail.

While Watson’s essay (Data is the new “____”) is explicit about the power of metaphors to shape public thought, Doctorow’s book and Huang’s afterword take the next logical step in that direction and highlight the clear and present danger for what it is.

It’s not the abuse of power by one head of state but the evolution of statewide machines that (exhibit the potential to) exploit the unpreparedness of the times to coerce and compel, using as their fuel the mountainous entity – sometimes as Gargantuan as to be formless, and sometimes equally absurd – called Big Data (I exaggerate – Jayaram was more measured in her assessments – but not much).

And even if Whitehead and Stach only draw parallels between The Trial and American society, the relevant, singular “flaw” of that society exists elsewhere in the world, too: the more we surveil others, the more we’ll be surveilled ourselves, and the longer we choose to stay ignorant of what’s happening to our data, the more our complicity in its misuse. It is a bitter pill to swallow.

Featured image credit: DARPA

From Orwell to Kafka, Markov to Doctorow: Understanding Big Data through metaphors

Vasudevan Mukunth

Read more

Empathy for Donald Pettit

Right to safe work

"Who are we?"

Happy Lord of the Rings Day