Close ☰
Menu ☰

Big Data vs Small Data

Posted on: Friday 10th of February 2012

In theory, there need not be a contradiction between Big Data and privacy. If you take a medical record, bank account or individual’s mobile phone location data you can remove all personal identifiers (name, address, customer number, IP address, mobile phone identifier) so that all you have is a bundle of attributes which you can crunch together to identify patterns and trends *.

Having identified a pattern or profile you can then match it back to an individual’s data, to place the person in a segment, to predict what they might do next, or perhaps offer them ‘people like you’ insights.

If, however, personal identifiers remain attached to the attribute data you have a potential privacy nightmare.

So let’s take a look at what different people are saying about the Big Data opportunity. In an article called ‘How Big Data can fuel bigger growth’, Accenture refers to “one of the most valuable but underused assets a company already has: its customer information. Call it Big Data”.

In this context, it seems that Accenture is thinking of ‘Big Data’ as personally identifiable data. It then offers some examples of how companies can monetise this data.

“Take the auto industry. Since many vehicles now feature GPS and telematics systems, some car manufacturers have been able to collect and monetize a wealth of data on customer driving habits.

“General Motors Co.’s OnStar telematics system, for example, not only provides vehicle security, information and diagnostics services to drivers, it also captures telemetry data. In 2007, OnStar and GMAC Insurance partnered to create an opt-in program that uses the telemetry data to offer lower insurance premiums to customers who drive fewer miles. Thanks to the program, consumers can save significantly on car insurance, which boosts GM’s customer satisfaction performance. This, in turn, helps GM attract new OnStar paying customers.

“In another example, in 2009, American Express Co. launched an analytics and consulting business that draws on the purchasing behavior of its 90 million credit card holders across 127 countries. This organization, American Express Business Insights, hopes to attract direct marketers by using proprietary data to enhance customer acquisition and retention programs.

So here, ‘Big Data’ is being equated with personalised information services and with the sale of customer data to third parties. Hmm.

Or take Pitney Bowes which, in a recent press release says it is “important for companies to start treating Big Data as a digital asset”. Companies need to “harness customer data intelligently, and better understand each individual customer’s needs and behaviours,” it continues.  Hmm again. Is this anonymous Big Data working at the level of statistics? Or is it Big Brother Data, sweeping up all the information that may be available about the individual?

In its original paper on Big Data, McKinsey noted that “personal data such as health and financial records are often those that can offer the most significant human benefits”. But at the same time it notes that “Many citizens around the world regard this collection of information with deep suspicion, seeing the data flood as nothing more than an intrusion of their privacy.”

Instead of addressing these concerns in detail however, McKinsey resorts to hand-waving. “It is clear that individuals and the societies in which they live will have to grapple with trade-offs between privacy and utility,” it says, and leaves it at that. So McKinsey are admitting there’s a problem, but shrugging it off as a price to pay.

This is how scandals are born: through cavalier attitudes to difficult problems which are brushed aside, which reinforce those ‘deep suspicions’ – and come back to haunt those involved. Every time you see the words ‘Big Data’ I suggest you read ‘Big Backlash’.

* Research has shown that often, it’s possible to work back to an identified individual from attribute bundles, so this is not foolproof. But in a ‘pure’ Big Data context where the focus is on the aggregated data, it is not necessarily a major problem.