Correlation doesn’t equal causation. This point sometimes gets lost in the Big Data discussion.
The argument goes that if you have enough data, correlation is good enough. Even our friend Kenneth Cukier wrote that the focus of analyzing Big Data will shift from causation to correlation. “This represents a move away from always trying to understand the deeper reasons behind how the world works to simply learning about an association among phenomena and using that to get things done.” And current TED curator and former WIRED editor Chris Anderson proclaimed in his 2007 essay that the data deluge will make the scientific method obsolete. “Petabytes allow us to say: ‘Correlation is enough.’”
Even our Chief Analytics Officer will admit that for some things correlation may be OK. However, for many things it is important to demonstrate the impact of an action in a cause/effect fashion. For example with loyalty programs, members in the program may spend more than those not in the program. But, is it just a correlation? Would they have spent more anyway because they are the best customers, or did the program drive the higher performance? It is important to determine causation, or you may be leaving money on the table, or throwing good money after bad.
We also don’t believe that you can use Big Data to solve problems without understanding the problems. Big Data can tell you nothing, if you don’t know what questions to ask it.
This is where the fun with spurious correlations comes in. Tyler Vigen has put together a great collection of spurious correlations. Here’s one that we appreciated:
So where do you stand on the correlation/causation debate? Let us know.