Close ☰
Menu ☰

Big Data, Small Data and Running Data

Posted on: Monday 16th of April 2012

There is now a healthy debate raging about the relative merits of Big Data versus Small Data, with some excellent stuff being written about the importance of small (i.e. personal) data.

Take these thoughts from Personal CEO Shane Green.

“Small data puts the power and tools of big data into the hands of people. It is based on the assumption that people have a significant long-term competitive advantage over companies and governments at aggregating and curating the best and most complete set of structured, machine-readable data about themselves and their lives – the “golden copy”. With proper tools, protections and incentives, small data allows each person to become the ultimate gatekeeper and beneficiary of their own data.

“Small data can also greatly improve the capacity and performance of governments and non-governmental institutions, from eliminating time-consuming forms and other inefficient data practices, to improving public health and education by leveraging the power of more accurate and complete data provided with an individual’s permission. Such institutions can also help share important data with individuals, allowing them to have a copy for their own use.”

Shane is absolutely right. This is where the world is going.

Information logistics

But there is another level which we also need to investigate. At a certain point, counterposing ‘Big [statistical] Data’ to ‘Small [personal] Data’ stops being helpful. That’s because, in the real world, to get things done, we need to create specific aggregations of data some of which is personal, some of which is not; some of which draws on statistical Big Data insights; some of which draw on VPI or volunteered data. It’s through these combinations of different types of data that real value is created both in individuals’ lives and in organisations.

To get to this point we need to look beyond Big versus Small Data to ‘Running Data’ or information logistics: getting exactly the right information to, and from, the right person at the right time. This is about moving data – about data flows – and it’s about creating the right combinations of data for the right tasks.

That word ‘exactly’ is very important. Think of each piece of data as a component in a product. For the product to work, you need all the components – all the different bits of information – (put together in the right way).

If one essential component is missing the product won’t work properly. On the other hand, once you’ve got all the components you need, you don’t want any more. Having more data just creates potential confusion, complexity and cost. It positively destroys value, because you are likely to waste time and money wondering what to do with it. So getting exactly the right information is really important.

Now. The thing about Running Data (or information logistics) is that it is a hard problem to solve. Here are some of the things you need to do to get it right.

1. Create the specification. In other words, define exactly what you are trying to achieve, and exactly what bits of data you need to achieve it. This is often easier said than done. In fact, looking forward to the emerging market for decision-support services, helping people define and build such specs is likely to emerge as one of the biggest sources of added value: ‘helping me realise what it is that I want”.

2. Locate that data This means it needs to be tagged and described correctly, bearing its potential uses in mind. Given that the same piece of data may have many different uses, this isn’t always a trivial issue.

3. Access this data You need the technologies – and the permissions – to find and retrieve the data you need.

4. Use this data. Actually using the data creates its own hugely important set of sub-questions. They include:

  • Data quality You need to be confident that it is accurate, correct and up-to-date. That means you need to know, or have confidence in, its provenance and the processes by which it the data has made been available.
  • Rights and permissions You need to have the right or permission to use the data in the way intended. That means it has to have its own commercial/contractual wrapping defining terms, conditions, prices etc. For example, is the data for once-only use; does it have a ‘sell-by’ date, and so on.
  • Clarity You need a series of wrap-around processes that go with these things – to eliminate doubt and ambiguity, to do all these things quickly and efficiently, to monitor that it’s working and identify when and where it isn’t, and so on.

5. Assemble the data You need to be able to do this for each of the data components you need to achieve your task, and you need to be able to assemble them in the right way so that the complete data product works as an integrated thing (think of all the different parts needed to make a Jumbo Jet versus the assembled jet itself).

6. Safety and security You need to be to do all these things safely and securely so that privacy is not invaded, commercial value is not leaked, and so on – so that everyone involved can confidently trust the whole process.

Going back to Shane’s observations about Small Data, none of this can happen if individuals are not empowered to join the party as data managers in their own right. Information logistics without the involvement of the individual (including the information individuals can volunteer) is like having that Jumbo Jet without the fuel to get it airborne. Because, in the end, most (if not all) practical uses of data to get things done involve the input and use of at least some personal data.

This is what the new industry of personal data stores contributes (watch out for Ctrl-Shift’s forthcoming survey of the fast emerging PDS market, out soon).

Meanwhile, the list above gives a hint of the scale of the challenge now before us.

If it seems daunting, there is only one thing we need to remember: the scale of the opportunities it creates.

Alan Mitchell