First session today was Gerald Fahner of Fair Isaac discussing recent advances in Scorecard Technology. First thing to note is that Scorecards to Fair Isaac are generalized additive models that are predictive and are explicable (see this post for instance). They can be developed using regression analysis, optimization and various other techniques and allow both predictive power and interpretability. The focus in Fair Isaac's scorecard technology (part of Model Builder) is to combine data-driven machine-learning and domain expertise to allow the development of a model that meets real-world business constraints.
Gerald identified four building blocks - the fitting objectives (to optimize and trade-off objectives), optimization algorithms of various kinds, the score formula itself and score weight constraints. He then discussed the various stages of Fair Isaac's R&D pipeline to show how new kinds of research are developed, evaluated, internally adopted and then rolled out in the Model Builder product. These new innovations really map to 4 distinct business problems:
- Data Complexity - more and more data is available and its complexity is rising so it is a challenge to find the right attributes to model with
- Data Limitations - sometimes operations do not store the data that would be most useful
- Timing Uncertainty - to handle predicting when something happens rather than how likely something is to happen in a particular time period
- Competing Objectives - handling multiple conflicting objectives (like high response in marketing with profitable response)
Gerald discussed three main areas of innovation:
- Managing Data Complexity
Variable reduction techniques like correlation matrix analysis, principal component analysis and stepwise regression have limitations for instance with nonlinear regressions and highly correlated variables. While these are all useful, Fair Isaac has been developing some additional techniques to find variables that are related. One approach is to use some unsupervised clustering techniques to group variables that are related and this helps you explore very large variable sets and see which ones are related. This can then be used to automatically select variables for scorecards by going from cluster to cluster, finding the most predictive variable in each cluster. By iterating through them to see which variables add the most value while only selecting one from each cluster to prevent the use of variables that are, in fact, related. A second approach is a Weight of Evidence Sparklines that allows many graphs of the predictive power of individual variables so that you can review large numbers of variables of different types.
- Data Limitation
Reject inference allows you to extrapolate rejected applicants, for instance, to see how they would have behaved. There is a new, more automated process. Reject inference is hard if the accepted customers are very different from the rejected ones. Similarly if your screening process used data you don't capture then you can have a hard time inferring it for rejected applicants - if you had manual overrides for instance. One of the areas of research is to do more automation of this process to make it more robust.
- Competing Objectives
Multiple goal scorecards to handle competing objectives. For instance, if you want to have a good response rate for a marketing campaign but also manage the riskiness of the people who respond to the marketing campaign. If you market based only on responsiveness you might get those most desperate for your product, boosting the average risk of responders. Multiple-goal scorecards are developed to balance these competing objectives - a risk-adjusted response score. Essentially you say how much you are willing to give up on one objective to manage the second.
Here are some of his slides:
- The Fair Isaac Model Builder philosophy
- Building blocks
- How recent innovations map to business problems
- Variable clustering
- Weight of Evidence Sparklines