Search

Monday, 11 March 2013

A New Map of the World

Big Data started its technology hype cycle some years ago, driven by the usual combination of novelty (lots of data, care of the Internet), Moore's Law (the processing power to analyse it), and hucksterism (this is the new competitive edge). Classic public examples are Google's ability to predict flu outbreaks by analysing searches for treatments by geographical area, or the same company's analysis of the historical distribution of words in texts scanned by Google Books. These are largely benign uses, but more questionable ones, such as identifying people who are good bets for usurious loans, are being dreamt up all the time.

The chief delusion of Big Data is that it can tell you anything (which could mean it tells you either nothing or everything - it doesn't handle ambiguity terribly well). It isn't just a species of techno-nonsense, and the hyperbole isn't just restricted to the usual gaggle of utopians and commercial boosters. Like all panaceas and cults, from plastic to Scientology, it has ambitions to be universal and totalising: this will change your life, it is inescapable, resistance is useless. It is this claim to universality that has seen it gradually seep into the realm of politics, partly as a result of the managerialist ideology of consultancies like McKinsey's, and partly as a result of the Nate Silver effect - i.e. the success of the data-nerds in doing a better job of predicting the outcome of last year's US Presidential election than a bunch of right-wing fantasists with a sixth sense.

In June 2008, three months before Lehmann Brothers imploded, Chris Anderson, the editor-in-chief of Wired and author of the Long Tail meme, published a piece on Big Data in which he suggested that the sheer volume and extent of data available through the Internet would make theory and the scientific method essentially redundant. That's a big punt. His thinking was that the method of traditional science was based on the difficulty of gathering and analysing data. Because we could not directly study or comprehend everything, we relied on representative sampling, which gave rise to certain structural practices such as theoretical models, random control trials, and meta-analyses. In the "Petabyte Age" ("sensors everywhere, infinite storage and clouds of processors"), the "data deluge makes the scientific method obsolete". We don't need a theory, we don't need to test that theory, we can deduce a single and authoritative answer to any new question from existing data.

There is an obvious flaw. As Will Davies notes, the cheerleaders of Big Data believe that: "it is collected with no particular purpose or theory in mind; it arises as a side-effect of other transactions and activities". But of course all data are gathered with a purpose in mind, or at least with the potential of a purpose, which is why Google were criticised for picking up unsecured wi-fi data via Street View. Datasets are not value-neutral. They are the result of discrimination, even if arbitrary or accidental, and ever-larger datasets do not necessarily mitigate this as they may have low signal to noise ratios. And all that is before you even consider the erroneous assumptions built into the analytical algorithms. Of course, this observation, that data are biased in their collection and representation, can in turn be used as an argument for elite control. Though Big Data may make some expertise redundant, it opens up the field for new data specialists who can "correctly" weight and interpret the data.


Politics has become increasingly prone to the demand for evidence-based policy-making as an alternative to crude belief, but the very idea that everything can be measured and thereby managed is itself an ideological construct. It should be no surprise that Big Data is increasingly being positioned as tomorrow's solution to today's problems. This is the classic strategy of building ideologies that are above politics, like religion once was and neoliberal economics has become. Though there is limited tolerance for technocracy as a solution (witness Mario Monti), the principle itself has yet to be widely accepted as intrinsically repellent. We want to believe that scientists and technocrats act in good faith, and are thus happy to accept that any flaws in their actions are likely to be the result of inadequate data rather than prejudice or self-interest. Big Data furthers this ideology by holding out the prospect that complete data can provide perfect knowledge.

The social sciences have been dominated by two hegemonic disciplines in recent decades. Economics has seen humanity as a common tabula rasa on which individual preferences are written. Biology has seen individuals as instances of a common class, with our "truths" (illness, heritage) written in DNA. These are apparently polar opposites - the self-actualiser and the prisoner of genetic destiny - but they share a common view that we, both as individuals and as a society, are simply the aggregation of data. Big Data promises to fully reconcile the two: making sense of our preferences in aggregate, and finding the personal needle in the social haystack. What's not to like?

What this utopian vision relegates to the background is the way that Big Data creates a whole new asset class, though this is the central proposition of Big Data in its commercial context. Political reservations about it tend to focus on privacy and personal liberty, i.e. the fear that one's own property will be scrutinised or alienated, with less discussion about the "commons" or public goods. Big Data presumes universal datasets - data that is complete, comprehensive and consistent. What this means in practice is a preference for monopolies, though we blithely accept that these will be private rather than public, Google and Amazon rather than the state. Big Data = big corporations. The ultimate logic of this is a future Mark Zuckerberg being commissioned by the government to determine your healthcare eligibility based on your lifestyle preferences. And unlike an ATOS assessment, there will be no appeal.

The absurdity of Big Data's totalising ambitions was made beautifully clear many decades ago by the Argentinian writer, Jorge Luis Borges, in his fable On Exactitude in Science.
In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.
This is a variant on the principle of "the map is not the territory", the confusion of a model with reality. The evangelists argue that this is precisely the problem that Big Data circumvents, in that it isn't a model but reality itself - the data is the one truth. But it's interesting to note how Big Data and visualisation so often go hand-in-hand, as if we need the raw truth to be mediated by a slick interface. We want to believe that the data is authoritative, but we also want it presented like an unthreatening cartoon. As Borges's near-contemporary, Paul ValĂ©ry, said: "Everything simple is false. Everything which is complex is unusable."

No comments:

Post a Comment