Close ☰
Menu ☰

When Big Data meets Personal Data

Posted on: Friday 3rd of May 2013

Some interesting sessions at the DataIQ conference this week. My talk was on the theme ‘Where Personal Data Meets Big Data’. One meeting point takes the form of a clash: the requirements of Big Data and of personal data protection are at loggerheads. Big Data is all about collecting as much data as you can, keeping it as long as possible (to identify trends), and using the data for as many purposes as possible (because it’s all about discovering new patterns, correlations and insights).

Data protection rules on the other hand require informed consent for the collection and use of data (how can individuals give informed consent if they don’t know what their data is going to be used for?), only collecting and using data for pre-specified purposes, and keeping this data for as short a time period as possible.

Some Big Data enthusiasts are becoming cavalier about this contradiction, risking a regulatory and reputational backlash. My point, however, was that this isn’t just a formal compliance issue. It’s a data strategy issue: being clear about what sorts of data you want to use for what purposes, and being clear about these purposes. Here is a short video When Big Data meets personal data which sums up my argument.

The dream of perfect targeting

A good example of a lack of clarity is the current Big Data-fuelled dream of marketing omniscience. Decades of experience has taught marketers that the more data they can collect and analyse the better their targeting gets. The pattern has been repeated many times with the use of geodemographic data, transaction data, social media data, web analytics and so on.

This has encouraged the dream of perfect targeting: the belief that we are on the verge of accessing enough data to give us perfect consumer insight, enabling marketers to send them exactly the right message at exactly the right time, thereby delivering perfect relevance and satisfaction to the consumer with zero waste and maximum ROI for the marketer.

While the vision is laudable, there’s little evidence we’re close to achieving it, using current marketing methods. Most claims are based on uplifts in response rates. Trouble is, a 100% increase in response rates from 1% to 2% still leaves the 98% who are not responding. And the more we focus on ‘100% uplift in response rates!’ the more we ignore the 98%.

As I pointed out in my talk, many, significant structural barriers block the dream of perfect targeting. Here are some of them.

Trust The more data we try to collect about people, the creepier it gets, the more resistant people become. That’s why privacy and the value people get from their data is becoming one of the hottest political potatoes of our time.

Regulation Falling levels of trust are driving regulators to tighten data protection rules, not relax them.

Statistics The dream of perfect targeting is based on the assumption of perfect predictability. But perfect predictability doesn’t necessarily lead to perfect targeting. If I toss a coin many times, I can predict perfectly that 50% of the time it will be heads, and 50% of the time it will be tails. But knowing this doesn’t help me predict the result of the next toss. This simple confusion about the nature of statistical prediction lies behind many a bogus marketing claim, such as ‘we know what customers are going to do before they know themselves’. Even if you have a pretty good idea about this at a statistical level it won’t help you know what any particular customer is going to do next. This is why the claimed potential for many marketing analytics strategies are so exaggerated.

Knowledge gaps Even with today’s data explosion, there are still huge knowledge gaps in the marketer’s armoury. There are, for example, whole classes of problems where answers remain unknowable. Some of the biggest data sets in the world are used to predict the weather. Yet we still can’t get it right for more than a day ahead. That’s because weather patterns are chaotic – intrinsically unpredictable.

Clockwork mechanisms are predictable. Complex systems tend to be chaotic and unpredictable. The way many people talk about Big Data driven analytics and targeting suggests they are treating both types of system as though they are the same. This is a fundamental category error. What sort of beast is ‘the market’? Is it a piece of clockwork with perfect predictability? Or is it more like the weather? Intrinsically chaotic and unpredictable?

There are also huge swathes of information that marketers still cannot access. There is, for example, all the consumer knowledge that remains undigitised because it’s still sitting, untouched and unreachable,  inside individuals’ heads: their goals, plans, priorities, reasons why, changing circumstances and so on. This is information individuals could tell marketers if they wanted to and had a means of doing so. But the current focusing on ‘targeting’ instead of involving, and related trust issues, mitigates against this.

For all these reasons, I suggested in my talk, the dream of perfect targeting is not a compelling reason for getting into Big Data. Which takes us back to the data strategy point: know what data you are gathering and using for what purposes.

VPI is also ‘Big Data’

By the way. All above structural limitations on targeting are why Ctrl-Shift has been banging on about Volunteered Personal Information (VPI) for so long. Done the right way, VPI cuts through all the above Gordian knots, building trust, navigating regulatory issues, sidestepping statistical confusions, and filling knowledge gaps. That’s why it’s driving such significant breakthrough efficiencies.

When we look at its potential richness, scale and volume, VPI  brings complete new dimensions of actionable data to the party on a vast scale. It’s another universe of big data in its own right. Yet, it is completely ignored by Big Data enthusiasts.

That’s not to say traditional Big Data is useless and pointless. To the contrary, it’s got many uses and much potential value. But it’s not the answer to everything. It’s useful where it’s useful, and other sorts of data such as VPI are useful where they are useful. Which takes us back to our starting point. Has your organisation really got a robust data strategy?

Alan Mitchell