Beware the Pirates of Big Data
How can you make best use of the explosion in the availability, complexity, variety and volume of data that occurred in the last decade, which is popularly known as Big Data? Thin…

How can you make best use of the explosion in the availability, complexity, variety and volume of data that occurred in the last decade, which is popularly known as Big Data? Think! Don’t be led, blindly, by the data, Big or otherwise.
If you’re familiar with Big Data, you will be familiar with the four Vs of Volume, Velocity, Variety and Variability – five if you add Veracity. The word Big focuses us on the V of volume and talk quickly turns to exabytes and zettabytes (1021 bytes or 1,000 exabytes).
You’ll have been told that all those zettabytes offer you great opportunities to gain profitable insights. But do they?
There is an implicit assumption that more data means more information. There is a danger that if you blindly apply statistical and data mining tools to massive data sets you will be doing little more than numerology. Given enough data you can prove anything!
More than that, if you forget the difference between correlation and causality, and you fail to sense-check your conclusions, you can find yourself drawing ridiculous conclusions.
Consider pirates. If you look at the number of pirates in the world since the early 19th century and plot this against the global average temperature you can see that there is a statistically significant inverse relationship. If you take this at face value, you could conclude that global warming is due to the decline in the number of pirates in the world.
The data suggests we could solve global warming if we encouraged piracy.
You can be confident that the decline in pirate numbers is not the cause of global warming. This chart demonstrates that even a strong correlation can be nonsensical and if you don’t sense-check your conclusions you may look foolish.
As we move to an era where we are processing bigger and bigger data sets and where analysis, by necessity, is becoming more and more automated, can you be confident that you are not finding pirates in your data? Validating against historic data and measuring statistical significance won’t save you. So what can you do?
Look to the principles of Operational Research. Don’t start with data and see what it tells you. Start with a question or a hypothesis and see if the data support or refute it.
Would you start with the question, is the rise in global temperature related to the number of pirates? I hope not!
More data is not always better and the blind application of statistical analysis and data mining can lead to spurious correlations. Remember the principles of Operational Research: start with a hypothesis and use the data to support or refute it. But, above all, beware the pirates of Big Data!
P.S. You may wish to inspect the x-axis a little more closely.
Popular Posts

Business and IT Alignment is Critical to Your AI Success
These are the five pillars that can unite business and IT goals and convert artificial intelligence into measurable value — fast
Read more
Average U.S. FICO Score at 717 as More Consumers Face Financial Headwinds
Outlier or Start of a New Credit Score Trend?
Read more
FICO® Score 10 T Decisively Beats VantageScore 4.0 on Predictability
An analysis by FICO data scientists has found that FICO Score 10 T significantly outperforms VantageScore 4.0 in mortgage origination predictive power.
Read moreTake the next step
Connect with FICO for answers to all your product and solution questions. Interested in becoming a business partner? Contact us to learn more. We look forward to hearing from you.