A friend (thanks Ann) sent me a link to this Nicholas Carr article "The Ignorance of Crowds" in which he discussed Eric Raymond's seminal paper on open source called "The Cathedral and the Bazaar". For those of you who don't remember it, the paper outlines two models for free software development:
- The Cathedral model, in which source code is available with each software release, but code developed between releases is restricted to an exclusive group of software developers.
- The Bazaar model, in which the code is developed over the Internet in view of the public - Linux, for example
Although the original paper was comparing different ways to develop free software, subsequent discussion tends to use the Cathedral v Bazaar analogy to compare commercial software (almost always Cathedral-style) and open source software (more typically bazaar-style). Anyway, Nicholas Carr's article talked about some of the limitations of the bazaar approach and when some combination or more cathedral-like structures are called for. Now I am not going to get into the debate over what's the right way to develop software but I could not help being struck by the similarities of the discussion to ones I hear around predictive analytics.
As I said once before (in Hits and Niches), predictive analytics take the past behavior of all customers and uses it to infer the likely future behavior of a specific customer. This is akin to what is sometimes called the "wisdom of crowds" and, at some level, to the idea of a bazaar. As Nick says in his article:
"The power that a crowd of contributors has to solve problems derives not just from its sheer size, although that is important, but from its diversity. "
And this is true in analytics also. There is little value to a model built from a pre-selected subset (selection bias is a big problem in analytics) so a diverse set of data is preferred. Where predictive analytics varies from the traditional wisdom-of-crowds approach is in the fact that this diverse group, this "bazaar" of actions and motivations, is then used to construct a formal model. A "cathedral" if you will. An expert, or group of experts, take all that information and filter it through statistical approaches, experience, testing and more to come up with a predictive model that can be executed. This "cathedral" model can turn uncertainty (which offer will someone we have never met prefer) into a probability (someone with these characteristics is much more likely to find this offer attractive than this one). It is this processing that makes predictive models more powerful than just aggregating and displaying the data. In some ways, therefore, predictive analytics is a cathedral-like process that builds on the value of a bazaar of different activity.
In his article, Nick says "if peer production is a good way to mine the raw material for innovation, it doesn't seem well suited to shaping that material into a final product". The behavior of a large, diverse group of people is great raw material but a formal process for turning that behavior into a model will generate better, more innovative insight than simply letting all the people in that group look at the data individually.
Does that work as an analogy? Let me know what you think...