“Garbage in garbage out” may sound cliché but that doesn’t mean it is not true, never more so when applied to building predictive models. Data analysts know that there are many ways models can go off track, yielding inaccurate or non-actionable results. Particularly when using observational data (rather than designed data), models can be infected by sample selection bias. If sample bias is ignored and not corrected, the models can lead to erroneous—and often expensive—decisions in a wide range of fields:
•Credit providers that extend credit to the wrong people see default rates rise and margins collapse.
•Political polling organizations that rely on historic or overly optimistic voting patterns suffer a hit to their reputation when actual election returns diverge from their prediction.
•Marketers may ignore promising segments of prospective customers—and miss out on the incremental revenues they would deliver—by being too wedded to historic impressions of who buys or finds their products useful