By Shafi Rahman
In my session at FICO World on Mechanisms for Building Multiple Models, we will be discussing the technique of modeling automation that FICO pioneered. We have a suite of predictive modeling functionalities and architectures, which are algorithm driven and can be easily automated. They can also be used to train models in a manual, expert driven manner. A combination of algorithm and expert-driven approaches yields highly predictive models where business domain knowledge has been imputed by an expert.
We have effectively leveraged modeling automation for creating an ensemble of models to solve various business problems, e.g., attrition. We begin by creating a single training dataset, which has the sampled profiles and the performance variables. The dataset is iteratively subset on rows and columns using an automated script. Each subset is used to train independent models of either the same architecture, or different architectures, again using automation. Ensemble voting techniques are then used for scoring.
Modeling automation is also helpful in creating decision trees with regression or other types of models at each leaf node. This is done to capture complex interaction using iterative partitioning with splitter variables that maximally capture the interaction in the data. Expert guidance may be required for determining the splitter variables. Once the tree has grown, training datasets for each leaf node comprise of the tuples that reach the corresponding leaves. The models are automatically trained on the training datasets.
In a previous blog, I discussed the approach of building thousands of models using parallelization and automation of data and modeling steps. The models predict likelihood of events for driving customer-centric decisions. For each event, a training dataset is created, and using modeling automation, models are trained for the thousands of events. Steps like sampling and profiling can’t be easily abstracted without sacrificing domain information or problem specification. So an expert designs, develops and proves out the process first, before it is automated to create the training datasets.
The aspects requiring domain or business expertise, like design of the experiment, require manual interventions. These can be automated for repeated usage, once the details are proven out. The predictive modeling steps are easily abstracted and scaled to new business problems, yet the models need to be checked for over-fitting and performance issues. Word of caution: it is easy to get into a trap of automating everything without expert intervention and thus ending up with nonsensical models.
I’ll discuss these mechanisms in more detail at FICO World. You can register for the event at http://www.ficoworld.com/.