Close ☰
Menu ☰

Panorama and the Big Data hype cycle

Posted on: Friday 5th of April 2013

When hype cycles reach their peak, the connection between the thing at the heart of the hype (which may or may not be important) and what people say about this thing frays to the point of breaking. Unjustified extrapolations, logical leaps, ignoring contrary evidence, omitting consideration of other factors – these all become fair game once the hype takes over.

Panorama, a UK documentary programme with a reputation for seriousness, aired a programme last night showing the Big Data hype cycle at its peak. Let’s take a few examples.

Big Data in financial markets

In one section it implied that the application of Big Data techniques could be used to ‘beat’ sophisticated financial markets. Using the example of coffee, it showed how traders were gathering data from diverse sources (trends in consumer demand, weather patterns, political stability in growing nations etc) to build sophisticated models of how these markets work, so that they can find tiny signals in trading patterns which they can exploit to make money.

What it failed to point out was that the firm they were profiling is not alone in its attempts to do this. Everyone is at it. In fact, over two third of all trades in US financial markets are now algorithm driven – ‘untouched by human hand’. They can’t be – because these algorithms are conducting 4000 trades every second, so fast that hundreds of trades are made in the time it takes for a piece of information to enter a human brain and for that brain to respond, say, by pressing a button.

This means that the vast majority of the ‘signals’ being monitored by these algorithms are not coming from the real world of coffee growing and consumption, but from the activities of other algorithms. And increasingly, these algorithms are being written with the activities of other algorithms in mind.

For example, if you want to buy a particular stock, you might want to trigger a run on that stock by generating ‘sell’ signals. If you succeed, and lots of people start selling, you can buy that stock at a lower price. In fact, ‘wanting to buy the stock’ has nothing to do with it. If you can trigger a run on a stock (‘selling short’) it doesn’t matter what the stock is, because you can make money on the trade. Any stock will do. So, what you do is write algorithms designed solely to create, look for, respond to, and game the exponentially exploding array of signals and patterns … that are being created by other algorithms.

In this way, far from creating better models of how the real world works, financial trading algorithms are severing the last remaining links between the real world and the activities of financial ‘markets’. Yet the prices of most of our foodstuffs, raw materials and currencies are determined by these casino activities. This is toxic lunacy on a gargantuan scale, which regulators are desperately struggling to catch up with (and, so far, failing): toxic lunacy which Panorama presents as a clever Big Data breakthrough.

Big Data in advertising

The programme then moved on to Big Data in advertising. Every call, click, text, search or journey individuals do generates new data – 2.5 billion gigabytes of it a day. Panorama profiled a company that mines this data to serve up ads. “If we can collect enough data about past behaviour, it could be useful to predict what people might want to buy,” the spokesman explained.

The problem, he went on, was how to sift signal from noise in these vast piles of data. The trick: to employ a rocket scientist (literally, a rocket scientist from Nasa) to apply an arcane branch of mathematics called decision theory to cut down the number of variables used to drive the ad-serving process. (At which point, the explanation goes opaque.)

Instead, to illustrate how decision theory works, the programme shows the rocket scientist using it to do his grocery shopping. Except he didn’t use decision theory at all. He chose what he liked.

A couple of points here. The operative word in the quote above is not ‘predict’ but ‘might’. Guesswork, in other words. Response rates to Big Data-driven behaviourally targeted advertising average around one in a thousand. Heavy lifting data crunching and analytics generates an uplift of 10-20% if you’re lucky. So, instead of one person in a thousand responding, now 1.2 people in a thousand do. Leaving the other 998.8 as unengaged and unimpressed as ever.

When you are investing vast sums of money to serve up huge volumes of ads, this sort of uplift is worth it. It generates some sort of ROI. But it’s investing vast sums of money to play on the margins of the margins. And there is an alternative. Instead of investing these vast sums gathering huge amounts of data about people behind their backs (thereby potentially infringing their privacy), and trying to predict what they are going to do next, you could ask them what they like, or plan to do next. In other words, you could deal with them at the level that the rocket scientist actually behaved: the level of individual human behaviour.

Their answers would obviate the need for much of this investment and complexity while generating much greater returns, as our recent research into Efficiency Breakthroughs shows.

Lies, damn lies and …

Panorama didn’t investigate any such alternatives. Instead, it took the hype one step further, claiming that this sort of data mining can even identify “what people might want to buy even before they realise it themselves”. This isn’t a new claim. In fact, it’s quite old. Early CRM practitioners were making similar boasts 15 years ago and nothing came of it.

The reason nothing came of it is that it’s based on a category error: the difference between a statistical observation and making a specific prediction about a particular component of that calculation. I can tell you with 100% certainty that if I toss a coin many times, 50% of the time it will come up heads and 50% of the time it will come up tails. But that doesn’t help me improve my odds of predicting the result of the next toss.

Early CRM practitioners realised that customers with certain patterns of behaviour had, say, an X% chance of defecting to another provider. They ‘knew’ what a certain percentage of customers were going to do next. Great. But it didn’t help them direct their resources more efficiently because it was knowledge about a pattern, not knowledge about what any particular customer was going to do next. This remained as unpredictable as ever. Much Big Data hype simply repeats this category error on a much bigger scale.

Hype and reality

The current data explosion is indeed creating many challenges and opportunities. Think of it as akin to the invention of the microscope or telescope – enabling new observations and new discoveries. That’s fantastic. Unfortunately, much of the hype surrounding it is worse than uninformative. It actually spreads confusion and misunderstanding.

Meanwhile, Big Data solutions are not the only solution in town. In fact, as our Efficiencies research shows,  in arenas such as advertising they could be as much a part of the problem as the solution.