Big Data the Big Mistake

Even with the vast quantities of found data available today Big Data analysis does not in and of itself produce valid information. This is despite the sheer volume of endpoints and the corresponding technologies that have allowed us to process them at scale and on the cheap.

In this well written FT Magazine piece by Tim Hartford the basic assumptions of Big Data Analysis are challenged. Captured in the below axioms are the four pillars of Big Data.

  1. Big data analysis produces uncannily accurate results
  2. When every data point can be measured old statistical sampling techniques become obsolete. That is it is possible to achieve a sample, N. Whereby N=All data.
  3. Statistical and scientific models aren't needed because statistical correlation is sufficient.
  4. With enough data the numbers speak for themselves

However, a counter proposition is advocated in the article. It demonstrates through examples of Big Data triumph and failure that the old school tried and tested statistical analysis methodologies and techniques are still relevant. While the potential of Big Data is recognized it is still naive to think that sampling and its like are not necessary simply because you have more data. In fact just the opposite may be true.

From Google, Facebook, to the NSA. Governments and businesses everywhere are maximizing the value of the data available and are using it in new, scary, and exciting ways to achieve their end objectives but this doesn't mean the mathematicians and statisticians are out of a job just yet. Quite the contrary.


Photo by Trevor Paglen. NSA Headquarters.

Further Reading

Big data: are we making a big mistake?

Why big data is in trouble: they forgot about applied statistics

The Parable of Google Flu: Traps in Big Data Analysis

No data can replace rigorous thought / From Dr Brendan Kelly

Seizing the opportunity of big data / From Mr Rupert Naylor