Big Data

  • From Orwell to Kafka, Markov to Doctorow: Understanding Big Data through metaphors

    On March 20, I attended a short talk by Malavika Jayaram, a fellow at the Berkman Center for Internet & Society, titled ‘What we talk about when we talk about Big Data’ at the T.A.J. Residency in Bengaluru. It was something of an initiation into the social and political contexts of Big Data and its usage, and the important ethical conundrums assailing these contexts.

    Even if it was a little slow during the first 15 minutes, Jayaram’s talk progressed rapidly later on as she quickly piled criticism after criticism upon the concept’s foundation, which was quickly being revealed to be immature. Perhaps those familiar with Jayaram’s past research did (or didn’t) find the contents of her talk to contain more nuances than she’s let on before, but to me it revealed an array of perspectives I’ve remained balefully ignorant of.

    The first in line was about the metaphors used to describe Big Data – and how our use of metaphors at all betrays our inability to comprehend Big Data in its entirety. Jayaram quoted at length but loosely from an essay by Sara M. Watson, her colleague at Berkman, titled Data is the new “____”. It describes how the dominant metaphors are industrial, dealing with the data itself as if it were a natural resource and the process of analyzing it as if it were being mined or refined.

    Data as a natural resource suggests that it has great value to be mined and refined but that it must be handled by experts and large-scale industrial processes. Data as a byproduct describes the transactional traces of digital interactions but suggests it is also wasteful, pollutive, and may not be meaningful without processing. Data has also been described as a fungible resource, as an asset class, suggesting that it can be traded, stored, and protected in a data vault. One programmatic advertising professional related to me that he thinks “data is the steel of the digital economy,” an image that avoids the negative connotations of oil while at the same time expressing concern about monopolizing forces of firms Google and Facebook.

    Not Orwellian but Kafkaesque

    There are two casualties of this perspective. The first is the people behind the data – those whose features, actions, choices, etc. have become numbers – are forgotten even as the data they have given “birth” to becomes more important and valuable. The second casualty is the constant reminder that data is valuable, and large amounts of data more so, condemning it to a life where it can’t hope to be stagnant for long.

    The dehumanization of Big Data, according to Jayaram, extends beyond analysts forgetting the data belongs to faces and names and unto the restriction of personal ownership. The people the data represents often don’t have access to it. This implies an existential anxiety quite unlike found in George Orwell’s 1984 and more like the one in Franz Kafka’s The Trial. In Jayaram’s words,

    You are in prison awaiting your trial. Suddenly you find out the trial has been postponed and you have no idea why or how. There seem to be people who know things that you never will. You don’t know what you can do to encourage their decisions to keep the trial permanently postponed. You don’t know what it was about you and you have no way of changing your behavior accordingly.

    In 2013, American attorney John Whitehead popularized this comparison in an article titled Kafka’s America. Whitehead argues that the sentiments of Josef K., the protagonist of The Trial, are increasingly becoming the sentiments of a common American.

    Josef K’s plight, one of bureaucratic lunacy and an inability to discover the identity of his accusers, is increasingly an American reality. We now live in a society in which a person can be accused of any number of crimes without knowing what exactly he has done. He might be apprehended in the middle of the night by a roving band of SWAT police. He might find himself on a no-fly list, unable to travel for reasons undisclosed. He might have his phones or internet tapped based upon a secret order handed down by a secret court, with no recourse to discover why he was targeted. Indeed, this is Kafka’s nightmare, and it is slowly becoming America’s reality.

    Kafka-biographer Reiner Stach summed up these activities as well as the steadily unraveling realism of Kafka’s book as proof of “the extent to which power relies on the complicity of its victims” – and the ‘evil’ mechanism used to achieve this state is a concern that Jayaram places among the prime contemporary problems threatening civil liberties.

    If your hard drive’s not in space…

    There is an added complication. If the use of Big Data was predominantly suspect, it would have been easier to build consensus against its abuse. However, that isn’t the case: Big Data is more often than not used in ways that don’t harm our personal liberties, and the misfortune is that their collective beneficence as yet has been no match for the collective harm some of its misuses have achieved. Could this be because the potential for its misuse is almost everywhere?

    Yes. An often overlooked facet of using Big Data is the idea that the responsible use of Big Data is not a black-and-white deal. Facebook is not all evil and academic ethnographers are not all benign. Zuckerberg’s social network may collect and store large amounts of information that it nefariously trades with advertisers – and may even comply with the NSA’s “requests” – but there is a systematicity, an orderliness, with which the data is being passed around. The complex’s existence alone presents a problem, no doubt, but that there is a complex at all makes it easier to attempt to fix the problem than if the orderliness were absent.

    And this orderliness is often absent among academicians, scholars, journalists, etc., who may not think data is a dollar note but at the same time are processing prodigious amounts of it without being as careful as is necessary about how they are logging, storing and sharing it. Jayaram rightly believes that even if information is collected for benevolent purposes, the moment it becomes data it loses its memory and stays on on the Internet as data; that if we are to be responsible data-scientists, being benevolent alone will be inadequate.

    To drive the point home, she recalled a comment someone had made to her during a data workshop.

    The Utopian way to secure data is to shoot your hard drive into space.

    Every other recourse will only fall short.

    Consent is not enough

    This memoryless, Markovian character of the data-economy demands a redefinition of consent as well. The question “What is consent?” is dependent on what a person is consenting to. However, almost nobody knows how the data will be used, what for, or over what time-frames. Like a variable flowing through different parts of a computer, data can pass through a variety of contexts to each of which it provides value of varying quality. So, the same question of contextual integrity should retrospectively apply to the process of consent-giving as well: What are we consenting to when we’re consenting to something?

    And when both the party asking for consent and the party asked for consent can’t know all the ways in which the data will be used, the typical way-out has been to seek consent that protects one against harm – either by ensuring that one’s civil liberties are safeguarded or by explicitly prohibiting choices that will impinge upon, again, one’s civil liberties. This has also been increasingly done in a one-size-fits-all manner that the average citizen doesn’t have the bargaining power to modify.

    However, it’s become obvious by now that just protecting these liberties isn’t enough to ensure that data and consent are both promised a contextual integrity.

    Why not? Because the statutes that enshrine many of these liberties is yet to be refashioned for the Internet age. In India, at least, the six fundamental rights are to equality, to freedom, against exploitation, to freedom of religion, cultural and educational rights, and to constitutional remedies. Between them, the promise of protecting against the misuse of not one’s person but one’s data is tenuous (although a recent document from the Telecom Regulatory Authority of India could soon fix this).

    The Little Brothers

    Anyway, an immediate consequence of this typical way-out has been that one needs to be harmed to get remedy, at a time when it remains difficult to define when one’s privacy has been harmed. And since privacy has been an enabler of human rights, even unobtrusive acts of tagging and monitoring that don’t violate the law can force compliance among the people. This is what hacker Andrew Huang talks about in his afterword to Cory Doctorow’s novel Little Brother (2008),

    [In] January 2007, … Boston police found suspected explosive devices and shut down the city for a day. These devices turned out to be nothing more than circuit boards with flashing LEDs, promoting a show for the Cartoon Network. The artists who placed this urban graffiti were taken in as suspected terrorists and ultimately charged with felony; the network producers had to shell out a $2 million settlement, and the head of the Cartoon Network resigned over the fallout.

    Huang’s example further weakens the Big Brother metaphor by implicating not one malevolent central authority but an epidemic, Kafkaesque paranoia that has “empowered” a multitude of Little Brothers all convinced that God is only in the detail.

    While Watson’s essay (Data is the new “____”) is explicit about the power of metaphors to shape public thought, Doctorow’s book and Huang’s afterword take the next logical step in that direction and highlight the clear and present danger for what it is.

    It’s not the abuse of power by one head of state but the evolution of statewide machines that (exhibit the potential to) exploit the unpreparedness of the times to coerce and compel, using as their fuel the mountainous entity – sometimes as Gargantuan as to be formless, and sometimes equally absurd – called Big Data (I exaggerate – Jayaram was more measured in her assessments – but not much).

    And even if Whitehead and Stach only draw parallels between The Trial and American society, the relevant, singular “flaw” of that society exists elsewhere in the world, too: the more we surveil others, the more we’ll be surveilled ourselves, and the longer we choose to stay ignorant of what’s happening to our data, the more our complicity in its misuse. It is a bitter pill to swallow.

    Featured image credit: DARPA

  • Curious Bends – commoner panthers, space diplomacy, big data sells big cars and more

    Curious Bends is a weekly newsletter about science, tech., data and India. Akshat Rathi and I curate it. You can subscribe to it here. If have feedback, suggestions, or would just generally like to get in touch, just email us.

    1. Why the GM debate in India won’t abate

    It is a sign of its inadequacy that the debate on genetically modified crops in India is still on, with no end in sight. Although public consensus is largely polarised, the government has done its bit to postpone resolution. For one, decisions on GM crops are made as if they were “technical answers to technical questions”. For another, no formal arena of debate exists that also addresses social anxieties. (8 min read)

    2. One foot on Earth and another in the heavens

    Camera traps installed by the Wildlife Conservation Society of India have shown that about one in ten of all leopard images belong to black leopards (that is, black panthers). These melanistic big cats have been spotted in wildlife reserves in Kerala and Karnataka, and seem commoner in the wetter forests of the Western Ghats. In fact, written records of sightings in these parts date from 1879, and could aid conservation efforts in a country that lost its cheetahs in 1960. (2 min read)

    3. One foot on Earth and another in the heavens

    For smaller and middle income nations, strengthening institutional and technical capacity on the ground might be a better option than to launch satellites because more than vanity, the choice makes them better positioned to gather useful data. And if such a nation is in South Asia, then India’s planned SAARC satellite could make that choice easier, providing a finer balance between “orbital dreams and ground realities”. (5 min read)

    + The author, Nalaka Gunawardene, is a journalist and science writer from Colombo, Sri Lanka.

    4. Do big car-makers know their way around big data?

    When sales slumped, Mahindra & Mahindra, an Indian car-maker, used data gleaned from the social media to strip its former best-selling XUV500 model of some features and sell it cheaper. The company declined to give further details. This isn’t unique—big car-makers around the world are turning to big data to widen margins. But do they know how best to use the data or is it just that putting the squeeze on this lemon is a fad? (6 min read)

    5. A geothermal bounty in the Himalayas

    As the developing world edges toward an energy sufficiency crisis, scientists, environmental conservationists and governments get closer to a Mexican standoff. This is no better highlighted than with the gigawatts of geothermal energy locked up in the Himalayas. A 20-MW plant could “save three million litres of diesel”, $2 million and 28,000 tons of carbon dioxide in northern India per year. Why isn’t it being used? (2 min read)

    Chart of the week

    “Both [female genital mutilation and child marriage] stem from deeply rooted social norms which can only be changed by educating parents about the harm they cause. Making foreign aid conditional on results gives governments an extra incentive not just to pass laws, but to enforce them. Police and women’s activists in some countries have set up phone hotlines and safe houses for victims or girls at risk. Most important … is to make sure that girls go to school and finish their studies.” The Economist has more.

    20140726_IRC374

    If you learnt something new from Curious Bends, why not spread the word? Share this week’s newsletter with your friends and ask them to subscribe. Have a nice day!