Skip to main content
A Moment of Science: Big Data and the Commoditization of Analytics

By Josh Hemann

The analytic computing issues that now fall under the rubric of Big Data point to an emerging tension in the software world: commoditization versus customization.

The era of algorithm commoditization
In the 1970s and 80s, libraries of algorithms were developed, such as IMSL and NAG, providing advanced mathematical and statistical routines that returned consistent results across different hardware and compilers. This was of tremendous value at the time and remained so for decades. But 40 years later, algorithms themselves can no longer be sold profitably by software vendors, or invested in heavily by businesses.

Algorithms have become a commodity in the sense that there are numerous options to perform analytic computation in a variety of programming languages and settings. The real value (to buyers and sellers) now comes from all of the additional services and ecosystem around those algorithms.

This commoditization did not happen overnight of course. The open source movement led to tremendous depth and breadth in analytic offerings, creating a marketplace in which analytic approaches became, in some sense, exchangeable. Even the way in which users develop analytics changed, with the point-and-click tools becoming more popular and writing code less, so that the algorithms themselves were less front-and-center and more of just another cog in a larger system.

Will the commoditization that happened in algorithms happen for the overall practice of analytics?

While there are some emerging trends, I don’t think this will happen anytime soon. With respect to those trends, today there are numerous options with which businesses can outsource their analytic needs and switch technology stacks (sometimes painfully). Further, the social aspect of analytic development (e.g. Kaggle and projects on GitHub) have popularized the idea that domain expertise is not really needed, maybe just a degree in physics and a library of machine-learning methods. Yet, part of what Big Data is exposing is that decades of standardizing on database tools and vendors have left organizations without the skill sets needed to quickly adapt to and leverage increasingly complex data.

Further Big Data has also brought back computing issues that have not been of primary focus for most analytics professionals for a long time. These are issues like paying close attention to memory consumption; developing visualizations appropriate for small or low resolution screens; needing to be hyper-focused on execution speed.

Basically, the hardware and software limitations from the past decades now have to be wrestled with. Because while the hardware and software have improved immensely, the increases in the three Vs of Big Data (Volume, Variety, Velocity) have been overwhelming, and we consume more information on devices with limited computing power and screen space (i.e. mobile feature phones and smartphones). So, the tool chains, hardware, and skill sets needed by businesses to compete on Big Data are more eclectic and ill-defined than ever.

Finally, what I think will prevent this commoditization (any time soon) is the fact that developing algorithms to answer business questions is often times the easy part, relatively speaking. The hard part is dealing with political realities and IT constraints that can prevent analytics from truly acting on and enhancing a business’s processes at scale. These are human problems, not solely software ones, and they are unique to each business.

related posts