Cathedrals, Bazaars, Crowds, Ignorance and Predictive Analytics
A friend (thanks Ann) sent me a link to this Nicholas Carr article "The Ignorance of Crowds" in which he discussed Eric Raymond's seminal paper on open source called &quo…

A friend (thanks Ann) sent me a link to this Nicholas Carr article "The Ignorance of Crowds" in which he discussed Eric Raymond's seminal paper on open source called "The Cathedral and the Bazaar". For those of you who don't remember it, the paper outlines two models for free software development:
- The Cathedral model, in which source code is available with each software release, but code developed between releases is restricted to an exclusive group of software developers.
- The Bazaar model, in which the code is developed over the Internet in view of the public - Linux, for example
Although the original paper was comparing different ways to develop free software, subsequent discussion tends to use the Cathedral v Bazaar analogy to compare commercial software (almost always Cathedral-style) and open source software (more typically bazaar-style). Anyway, Nicholas Carr's article talked about some of the limitations of the bazaar approach and when some combination or more cathedral-like structures are called for. Now I am not going to get into the debate over what's the right way to develop software but I could not help being struck by the similarities of the discussion to ones I hear around predictive analytics.
As I said once before (in Hits and Niches), predictive analytics take the past behavior of all customers and uses it to infer the likely future behavior of a specific customer. This is akin to what is sometimes called the "wisdom of crowds" and, at some level, to the idea of a bazaar. As Nick says in his article:
"The power that a crowd of contributors has to solve problems derives not just from its sheer size, although that is important, but from its diversity. "
And this is true in analytics also. There is little value to a model built from a pre-selected subset (selection bias is a big problem in analytics) so a diverse set of data is preferred. Where predictive analytics varies from the traditional wisdom-of-crowds approach is in the fact that this diverse group, this "bazaar" of actions and motivations, is then used to construct a formal model. A "cathedral" if you will. An expert, or group of experts, take all that information and filter it through statistical approaches, experience, testing and more to come up with a predictive model that can be executed. This "cathedral" model can turn uncertainty (which offer will someone we have never met prefer) into a probability (someone with these characteristics is much more likely to find this offer attractive than this one). It is this processing that makes predictive models more powerful than just aggregating and displaying the data. In some ways, therefore, predictive analytics is a cathedral-like process that builds on the value of a bazaar of different activity.
In his article, Nick says "if peer production is a good way to mine the raw material for innovation, it doesn't seem well suited to shaping that material into a final product". The behavior of a large, diverse group of people is great raw material but a formal process for turning that behavior into a model will generate better, more innovative insight than simply letting all the people in that group look at the data individually.
Does that work as an analogy? Let me know what you think...
Technorati Tags: analytic application, analytics, data mining, ignorance of crowds, predictive analytics, the cathedral and the bazaar, the long tail, wisdom of crowds
Popular Posts

Business and IT Alignment is Critical to Your AI Success
These are the five pillars that can unite business and IT goals and convert artificial intelligence into measurable value — fast
Read more
Average U.S. FICO Score at 717 as More Consumers Face Financial Headwinds
Outlier or Start of a New Credit Score Trend?
Read more
FICO® Score 10 T Decisively Beats VantageScore 4.0 on Predictability
An analysis by FICO data scientists has found that FICO Score 10 T significantly outperforms VantageScore 4.0 in mortgage origination predictive power.
Read moreTake the next step
Connect with FICO for answers to all your product and solution questions. Interested in becoming a business partner? Contact us to learn more. We look forward to hearing from you.