Skip to main content
Model Management Best Practices: Part 2

Welcome to another Model Management Monday. This is the second in my blog series on model management, each post highlighting a best practice that supports both compliance and improved performance.

Best Practice #2: Prepare a Suitable Data Sample

Regulators require you demonstrate your model validation sampling techniques are complete, responsible and relevant, since incorrect or inaccurate sampling can impact model performance. This holds true for both the initial validation after you develop the model, as well as your ongoing model validations.

For your initial validation, the sample you use should be independent of the development sample. This can inform whether a model is over-fit to training data, and provides a more realistic benchmark for how the model is likely to perform in production.

For ongoing validation of models, we recommend that you:

  • Avoid sampling when possible. It is best to use all records from a given time period to validate the model. When seasonality is an issue, choose a scoring window that will eliminate any seasonality effect.
  • If sampling, ensure a representative and adequate sample. Insufficient sample size can lead to poor conclusions in the model validation. When possible, select a random sample that adequately represents all subpopulations of interest. Use stratified sampling to ensure that the sample contains sufficient records for each subpopulation and outcome class. Keep in mind the economic, market and product situation during the timeframe in which you pull your sample, since this may impact the accuracy of your results.
  • Be aware of data bias: It is highly likely that your sample will be biased, for example, by the decision strategies you apply to new applicants. For accounts scoring well above your cutoff, you should expect a reliable odds-to-score relationship. However, near the cutoff score, this pattern may reasonably weaken or even reverse due to the influence of well-chosen overrides. Furthermore, because business outcomes are not known for rejected applicants, you won’t see as strong a separation between goods and bads compared to the development sample, even if performance inference was used. Be prepared to defend these influences to regulators.
Regulators will also inquire about your data hygiene processes. You should understand and document the accuracy of data sources, inputs, outputs, transformations and calculations for both model development and validation. Be prepared to demonstrate how you treat outliers and missing values. We recommend validating the reliability and quality of data sources yearly.

For more details on this and other best practices, download the FICO Insights white paper, "Comply and Compete: Model Management Best Practices" or Martin Butler’s paper on Model Management and Governance.

related posts