Skip to main content
Defining what makes a “good” model

Central to developing and evaluating an analytic model is the definition of what constitutes a “good model.”  At FICO, we view it as the dimensions of good.
 
Let’s take fraud models as an example. What makes one model look high-performing may be misleading if that performance degrades quickly.  A model that stops more fraud dollars may look better at first, until you notice substantial good dollars were impacted.  A model may have a high overall fraud detection rate, but may not be catching enough of the fraud types you consider most important. 

One of the classic measurements for fraud models is the percentage of fraud accounts detected at a particular account false positive ratio, where the account false positive ratio is defined as the number of good accounts flagged for each true fraud detected.  Although this measurement provides one view of cost/benefit, it is not without challenges, since the false positive ratio is dependent on fraud rate.  A typical variation to this metric would be a value detection rate vs. account false positive ratio, where value is now recast as the percentage of fraud dollars detected.

While these metrics are standard for fraud, there’s more to consider in the dimensions of good.  One notable deficiency is that the metrics above don’t include monetary cost associated with the good accounts flagged.  To get at a better cost/benefit estimate, you could look at the percentage of fraud dollars detected vs. the percentage non-fraud dollars flagged.  By using interchange rates, you then can determine when the pursuit of fraud dollars comes at a much larger expense of blocked good dollars.  The frustrating reality is that some fraud is more expensive to stop (in terms of impacted good customers) than simply taking the fraud loss.

Analytic experts understand that you can’t measure a model by detection performance alone.  Care needs to be taken when building the model to ensure robustness and stability to reasonable levels of shifts and changes in production data. 

As an example, a model that shows a huge increase in fraud detection over last year’s model typically means that last year’s model was poorly designed since it degraded so quickly.  Extensive tests on out-of-time data, sensitivity analysis, or even simulating possible shifts in data can help determine the sensitivity to changes in the production data.  An overly sensitive model points to one that was over-trained or contains variables poorly chosen by the modeler.

Other dimensions of good to consider include:

  • Determining the impact to high-value customers by computing the future present value of the customer.  You can then determine the percentage of high-value customers that are being impacted by a model.  By understanding this percentage, you can take action to reduce false positives associated with high-value customers. 
  • Monitoring the time to detect fraud in terms of the average number of transactions, or even average fraud loss per incidence year to year. 
  • Monitoring performance on segments of fraud, such as card present vs. card not-present.  An issuer typically is not responsible for the loss on card-not-present fraud, and thus less interested in these cases being flagged by the model.

When modeling complex problems like fraud, there is no simple way to define what makes a “good model." Evaluating the most effective model means always considering multiple dimensions of good.

related posts