Skip to main content
Garbage In, Garbage Out: Correcting Sample Bias

On this blog, I’ve previously warned about the dangers of ignoring data sample bias, a problem often leading to predictive models that don’t perform to expectations. It’s an analytic example of “garbage in, garbage out” that, for lenders, can result in erroneous and often very expensive decisions, such as higher default rates due to extending credit to the wrong people.

Today, I’ll focus a bit more on sample bias, referencing a case study to illustrate the risks. Specifically, I’ll share why using PD (probability of default) models developed strictly on accepted and opened applicants, while ignoring rejected or not booked applicants, can lead to subpar acquisition risk models.

I pulled the case study from a new FICO white paper “How to Correct Sample Bias,” written by my colleague Nina Shikaloff. The case study is based on a loan origination scoring process that uses credit bureau data. This allowed us to compare the actual bad rates on a booked portfolio (using a simulated origination process) vs. those for the applicant population as a whole.

The results showed that the booked bad rates were significantly lower than for the whole population, often with predictor patterns flattened, and in some cases reversed. Building an acquisition risk model on such biased data would provide a subpar score that rank-orders less effectively and underestimates risk. The study also showed that undertaking a proprietary reject inference process (completed blind without reference to the actual bad rates) resulted in bad rates much more closely aligned to the actual.

Sample Bias Case Study Results

For the case study, we leveraged a new sophisticated sample bias correction technique, inspired by the work of economist and Nobel laureate James Heckman. The technique produces further improvements in estimating population bad rates and making better risk model based decisions, and has now been adopted as a standard by a number of our clients. This proprietary reject inference technique is now accessible for all our clients in our analytic modeling software, both FICO® Model Builder and FICO® Analytic Modeler Scorecard Professional.

To learn more about this new technique and dig in to results from the case study, I invite you to download our white paper: “How to Correct Sample Bias.”

related posts