Posted on: Tuesday 17th of January 2012
‘Big Data’ seems well on its way to becoming the next big bandwagon. However, in the scheme of things I think it’s more like a Big Dead End.
Before saying why, let me first recognise where and how ‘Big Data’ is important.
Our society is transitioning from an ‘old normal’ of information scarcity to a ‘new normal’ where massive new data sets continue to come on stream. Just think of the volumes of data now being created in payments (the shift from cash to plastic to contactless and mobile payments systems), mobile (location data), online (search, click streams and the like), and even travel: just think how much data Oyster is collecting about Londoners’ travel activities.
Clearly, there’s tremendous value in mining these vast new data sets to identify trends and patterns, and in crunching different data sets together to find even more trends and patterns. All great stuff. No question.
So why claim that Big Data is actually a Big Dead End?
First, we need to beware the hype. To take just one example, Big Data does not mean that now “anything can be predicted”. Yet that’s what one breathless book – Super Crunchers by Ian Ayres – claimed a few years ago. That sort of hype really isn’t helpful.
More important, Big Data is actually just a case of ‘more of the same’. It fails to address the really big information challenges (and opportunities) that we now face. Let’s unpick this a little.
Statistics and specifics
Big data is all about statistics: divining patterns and trends from large data sets. Statistics are incredibly powerful and useful for the way they challenge the assumptions and inferences naturally made by human minds – many of them faulty. As I said, that’s great.
But if we look at the really big value gap faced by society nowadays, it’s not the ability to crunch together vast amounts of data, but quite the opposite. It’s the challenge of information logistics: of how to get exactly the right information to, and from, the right people in the right formats at the right time. This is about Very Small Data: discarding or leaving aside the 99.99% of information I don’t need right now so that I can use the 0.01% of information that I do need as quickly and efficiently as possible.
Very Small Data is a different class of data to Big Data because it’s about the specific content of different pieces of data – and the ability to sift this unique value from all the rest – even if it is to do something really boring and mundane such as get a local authority parking permit.
To get a parking permit, my local council tells me I need two proofs of address, a vehicle registration document, and a valid insurance certificate. All of them are highly specific bits of information. There are, for example, around 31 million cars on the road in the UK right now, but to get that parking permit the local authority is not interested in information about the other 30,999,999 cars: just the details of my one. And particular, specific information about my car needs to be linked to me personally, not the other 50 million people eligible to drive a car.
So there are two classes of data which help solve different types of problem. Big Data is statistical and deals with general trends and patterns; Very Small Data is specific and deals with getting things done: gathering the information needed to make a decision, to make an arrangement, or to get some administrative chore done. Because it’s Very Small and rather mundane and specific, it doesn’t seem as glamorous and important as Big Data. But it is.
In fact, this is where our economy’s next big productivity breakthrough is going to come from: information logistics – getting exactly the right information to and from the right people at the right time so we can solve problems, make decisions, organise and implement things without wasting time and effort looking for the right data or sifting through and discarding the wrong data.
This is a massive operational and design challenge with only one real historical parallel: mass production. The explosion of wealth creation that defined the 20th century economy was the discovery of a highly efficient way to assemble multiple different components into products. Each product required its own set of components – and only those specific components (no more, no less). They, in turn, needed to be put together in the right order and in the right way. For this to happen, each product required its own carefully designed supply chain and assembly processes. Each assembly of components then came together to create a specific product which did a specific job. Once we had cracked this formula of ‘mass production’, there was no limit to the range of products we could make: fridges, cookers, microwaves, vacuum cleaners, cameras, radios, televisions, telephones, calculators, computers, motor cars, hedge trimmers, lawn mowers, hair dryers, light bulbs and so on ad infinitum.
Today, the truly gigantic economic opportunity is to do the same with information products – by which I mean the assembly and integration of many different, specific information components that are needed to get a specific job done such as renew a parking permit, or research the purchase of a new car (or fridge, or cooker), or organise a life event such as ‘move home’, or manage a life process such as ‘manage my money’. Each one of these information jobs requires its own unique set of specific information components, which need to be assembled in the right way, and in the right order, to create a workable solution.
There are countless such putative information products out there – millions of them. But at the moment most are addressed in a highly inefficient, bespoke or DIY craft sort of way: made separately and painstakingly, one at a time. What we need for that next big breakthrough in productivity is the information logistics equivalent of the mass production assembly line: the infrastructure and supply chains necessary to bring the right information together in the right ways at the right times for the right people.
And, no matter how big, exciting and impressive Big Data is, that’s one thing it cannot do because it is dealing with statistics, not specifics. Instead, all it really offers is more of the same: more data collection by the same entities leading to more data crunching. While the volumes of data now being generated may be unprecedented, Big Data is actually just a continuation of a very old trend, not something new.
The Big Blindspot
The second ‘more of the same’ thing about Big Data is its organisation-centric assumptions. In many (though not all) of its manifestations, Big Data is actually customer data: data about customer behaviour. This is something that organisations collect about their customers in order to do something like offer them a product or send them a message. It’s all about helping organisations do more, more efficiently.
But if we think about the information logistics problems I’ve just mentioned, they don’t only revolve around the specific needs of specific individuals (rather than general statistical trends of Big Data), they are highly personal. They’re not only using personal data, they are helping the individual do something.
This is the Big Blindspot of organisation-centric thinking. It naturally assumes that all future improvement will come from helping the organisation do things better when, in fact, the biggest opportunities might lie elsewhere completely – in helping people as individuals to do things better via new types of person-centric service.
Helping people do things better locates the epicentre of new value creation outside the boundaries, and control, of the current organisational set-up. For many organisations (including those that are excited about Big Data) this doesn’t (yet) compute. But that’s just a function of out–dated assumptions. Once you start looking for the potential outside of the organisation-centric model, you can see it everywhere.
Big Data and Important Data
The third thing that’s small about Big Data is that, even with the vast amounts of data it deals in, it’s actually only skimming the surface of a deep sea of computation. There are more connections between the neurons in the brain of the average Joe than there are atoms in the universe (or so I am told). Big Data doesn’t touch what’s going on in peoples’ heads – their motivations, their intentions, their goals, priorities and so on. All it does is collect after-the-event data about what they do once they turn these thoughts and feelings into actions – and then it’s only those actions which register on some particular data gathering system.
So Big Data never goes to the source – the fountainhead of Really Big and Important Data: human beings and what they want to do right now, or plan to do in the future. Yet, thanks to all the technological revolutions that are going on right now, we are beginning to develop the ability to do just this. People are acquiring the ability to manage their own information, to input their own information and to share it with other people. This is creating an avalanche of Really Big and Important Data: Volunteered Personal Information – information about me, my priorities and circumstances and what I want to do right now.
It’s this VPI that lies at the heart of all the information logistics problems we’ve just been talking about. Information logistics is driven by highly particularly specifications – the input of information about what a particular person needs and wants right now – rather than the post-hoc crunching of data that happens to have been collected. Using Big Data you can infer a motive or intention from a pattern, but that’s never quite as good as getting the information direct from the horse’s mouth.
VPI and its connection to information logistics is the Really really Big Data challenge facing our society right now. How to unleash the incredible riches of this massive data resource which, like the oil in the ground before the 20th century, has always been there but has remained inaccessible, out of reach, and untapped? This is not about data collection and crunching – its about data sharing, a completely different type of technology and infrastructure problem to the one addressed by Big Data.
Big Data or God Quest?
The fourth problem with big data is that it has a logical flaw in it. Big Data may be Very Big but it is still siloed. If you manage to collect all the location data in the world, it still misses entire universes of data dealing with other things such as transport, or purchases, or searches. And most of the data that’s collected is not about the activities of the complete marketplace; only that data that can be collected by an individual company.
As Accenture point out in a recent article on Big Data, “No single organization has all of the data it needs to meet the demand for information services products; as a result, the ability to take your own information, combine it with other data and make it uniquely valuable via robust analytics will be critical to success.”
The big question for any company entering the Big Data arena, Accenture concludes, is “Can we combine our data with information from others and then use sophisticated data analysis to create differentiated products?”
Ah! Please note how one of the first conclusions drawn by a proponent of Big Data is the Big Data you’ve got is not big enough. To really make it valuable, you need Even Bigger Data. This is a trap that CRM practitioners have already fallen into. The data you collect is useful. But it has holes which you need to fill. Which leads you to collect more data. But this data still has holes. So you need to collect even more data. Ad infinitum.
What this does is suck practitioners into a sort of God Quest: the quest for the perfect, complete and all-encompassing database: omniscience. This is not just a pipedream It’s a potential black hole of time, money and resource fuelled – something to beware of, at least, and something which is avoided by the alternative route of improved information logistics.
Climbing to the Moon
The Big Data bandwagon is supposedly driven by ‘evidence’ – evidence that it delivers benefits yesterday’s ‘small data’ failed to deliver. Trouble is, it’s a misleading sort of evidence.
Imagine a world where every inch you got closer to the moon, you got rewarded. First, you climb the highest mountain. Then you start building skyscrapers, shouting all the while ‘see, we’re getting closer the moon. We have demonstrable evidence of success. Look at the rewards we are getting!’
But if you really want to get to the moon, you don’t climb mountains and build skyscrapers. You build rockets. A completely different activity.
Big Data is a massive investment in building skyscrapers. It’s well on its way to becoming organisations’ next big displacement activity – investing huge amounts of time, money, resources and effort not addressing the biggest, most important opportunity.
Displacement activities of this sort are incredibly wasteful and damaging. By all means use big data sets for what they are good at – statistical analysis of trends. But don’t bother trying to climb to the moon.