There’s little need to belabor the point that many businesses today generate more data in a single day than they used to in whole year. A few years ago storing all of this data at a reasonable cost was the biggest challenge. Newer technologies now solves for that.
However, data by itself is neither useful nor meaningful. Simply sitting on a stockpile of data provides minimal value. The real value is to make your data “decision-ready.” Simply put to take data process, augment and analyze it to improve its immediate decision value.
Whilst seemingly difficult, there is a tangible way to continuously get your data to state of decision-readiness. I’m not talking about a one-off project that takes a herculean kind of effort similar to the Manhattan Project. Rather I’m talking about systemic and automatable ways to achieve this goal.
Questions to Ask
- Start out by asking yourself what problems are important and impactful to solve for your business. This domain knowledge is a fundamental first step that will help to shape up key use cases that will produce a difference to your business.
- Ask yourself what data you will need to realize success with these use cases. Most of us have become accustomed to relying on a few sources that we can easily source and process. Going for data sources that are tried and tested might seem like a good idea. However bear in mind that the sources that haven’t been traditionally used inform us of patterns we may not otherwise have known about.
- Ask questions that use your domain knowledge to drive the development of offline analytics. Data often hides scores of patterns that may be relevant to a business. Analytics are key to discovering and responding to these patterns. Descriptive analytics methods that are in use in Business Intelligence offer a good way of summarizing historical behavior and informing you that something may have occurred. But “Decision-Readiness” isn’t achieved by harking back to the past. It’s all about being proactive by marrying the past with what may be happening in the moment. Predictive models are an excellent way to achieve that.
Once developed, the model will need to be deployed in an execution environment. This execution environment is where the model is supplied it’s inputs, then scored occurs through the encoded mathematics to produce the desired output, all in a hands-free manner. The choice of execution environment depends on factors such as the amount of data to supply to the model as well as the rapidity with which the computation must occur.
Various types of execution environments exist today… from in-database execution, web application execution and distributed processing streaming environments. Many environments are only geared towards batch execution, so watch out for that. While in many cases batch is perfectly fine, in other cases it is simply because we were trained to execute that way. Traditionally we’ve gathered our data till we had all of it, parked it someplace till we were ready to start processing it. Pragmatic decisions driven by costs and limitations of technology drove us to accept that yesterday’s news was good enough and that the costs of real-time processing were simply not worth it. Today technology has matured to offer us options where we can process a variety of data at real-time speeds through what we call stream processing. So you don’t have to process batch differently from real-time, simply process them all as streams and improve your decision-readiness.
Stream processing as an execution platform isn't new. It's been around for many years but it was somewhat the bastion of the distributed computing elites who understood how to use it work rather complicated scientific computation. It took very skilled physicists, mathematicians and computer scientists to encode these models and execution platforms. Today we are at an interesting point where this technology can be consumerized and packaged rapidly into solutions and applications that address use cases in a variety of verticals. More importantly it isn’t going to be some esoteric solution that only the very top tier in each vertical can afford.
To learn more or ask questions about this topic, join us with our partner Zementis at our Predictive Analytics: Seeing the Whole Picture June 18th at 10:00 PST.