Thursday, 14 June 2012

Big data means big money

The Draft Communications Data Bill has been greeted as a charter for "online snooping" by its critics, who have in their turn been accused by Theresa May, the Home Secretary, of being "conspiracy theorists", though she didn't specify which theory she had in mind. Perhaps a cross between The Matrix and the Illuminati.

The Web is now 20 years old (measured from Mosaic, the first GUI browser) but lawyers are still trying to get their heads around it. The draft bill is an open admission that the Regulation of Investigatory Powers Act (RIPA) of 2000, which also caused a "snooping" furore, was inadequate. The preamble to the new bill states:

Communications technologies and services are changing fast with more communications taking place on the internet using a wider range of services, including voice over internet, online gaming and instant messaging. Communications data from these technologies is not as accessible as data from older communications systems like ‘fixed line’ telephones. Although some internet data is already stored by communication service providers, other data is neither generated nor obtained because providers have no business need for it.

The bill centres on communications data, which is more properly defined as meta-data. In other words, who called whom when, rather than the content of the conversation. Back in the day, this meant data on calls made between phone numbers. In the case of the Internet, this means IP addresses, URLs, message headers etc. The purpose is to establish relationships - who is consorting with whom. This can provide sufficient justification for a more intrusive investigation, such as a wire-tap or the handing-over of emails, to be sanctioned by a warrant.

A real concern should be whether the pattern of relationships displayed by Internet data means the same as it would in the case of telephone data. In other words, are we simply taking an existing paradigm forward unthinkingly? Phone calls generally do reveal "known associates", but it's less obvious that tweets or blog comments do the same.

A lot of random stuff happens online, largely because you are following a chain of hyperlinks rather than making a single point-to-point connection. Apparently deliberate activity is effectively directed by the content rather than by you, and that's without considering automated pop-ups. This means a huge volume of insignificant noise. While a lot of this can be filtered out, there are likely to be many more false positives compared to telephone data where wrong numbers are the only real issue.

The other side of this is shown by the Google Street View brouhaha, where the commercial desire to record wi-fi hotspots resulted in scanning unencrypted transient data. This is held up by some as evidence that Google may really be evil at heart, or is even used as a roundabout defence of press phone-hacking. The correct reading is that the world is awash in data and most people are unaware that what they think is private is actually public. Boundary issues like this, where companies or governments blunder over the line, will become more common. As ever, cock-up is more prevalent than conspiracy.

The last sentence in the above preamble is interesting. Communication service providers (e.g. ISPs) may have "no business need" for data on who consorts with whom but application service providers (e.g. Facebook) certainly do. This however falls into the realm of content as the relationships have been defined by the user rather than being an inference from communication. The draft bill is assuming that technology will allow the meta-data and content to be separated, though the distinction is increasingly blurred. Is a "like" content or communications data?

The planned data harvesting and analysis infrastructure is expected to cost £1.8bn over 10 years. The obvious concern is the cost, which will undoubtedly prove an underestimate - it's a government IT project, after all. A greater concern should be the assumption that a system can built now in confident anticipation of the service landscape in 2023. The risk is that the cost balloons while the solution fails to keep pace with evolving technology. We end up with an expensive white elephant.

In practice the world of digital communications is too extensive and volatile for any massive surveillance infrastructure to be very effective, despite the claims being made now for "big data". The cheap availability of communication tools and media is so central to the modern economy that this is unlikely to change, and that in turn means there will always be easy ways to circumvent surveillance. In this respect both Theresa May and many of her critics are victims of the same anachronistic delusion, applying the "Orwellian" paradigm to a post-Orwellian world.

Perhaps this is deliberate. Perhaps the real purpose of all this talk of the competing needs of counter-terrorism and privacy is to distract from a project that will transfer billions of pounds of public money to consultancies, systems providers and network operators, at a time when we're meant to be cutting our cloth. Now that's a credible conspiracy theory.


  1. Are you suggesting it is a huge Keynsian boost without having to admit that Georgie was wrong?

  2. If only. The sums involved are negligible in terms of a stimulus. This is more an example of the corporate-political nexus extracting rent from the public coffers.

    The leveson inquiry is focusing attention on the mutual interests of politics and the media, but this serves to distract from the bigger issue that has been brewing over the last 5 years, namely the extent to which an ideological aversion to public expenditure has allowed corporates and their PR arm (McKinseys etc) to compensate for low private sector growth by securing government business. We're not just bailing out banks, we're shipping cash to the likes of A4e, Capita, Serco etc.

    This is not just a simple continuation of privatisation, but has now morphed into the creation of dubious projects and schemes to deliver services that did not previously exist. The mandatory work programme has already been shown to have little benefit at considerable cost, while the recent demonisation of 120,000 "troubled" families by Pickles and Duncan-Smith is a clear attempt to create a market segment that private providers can manage in return for public cash.

    May's initiative is another example. You can expect more.

    1. Did you see Krugman on crony capitalism? Not sure if he's not trying a bit too hard to find something bad to say, but iI agree with both of you that there is a commitment either not to have govt spending or not to admit it (ie by hiding behind infrastructure bonds when it would be cheaper just to build stuff).
      Though I have an ideological aversion to summer football, I enjoyed The O'Walcott etc.