Skip to main content
Cybersecurity: To Be (Empirical), or Not to Be?

That is the question for cybersecurity risk assessment.

FICO has been in the analytics business since our inception back in 1956.  Our founders, Bill Fair and Earl Isaac, had the novel idea that businesses could make better decisions through data. Before anyone thought to call the resulting algorithms “analytics,” they set off to create game-changing approaches to correlating signals with outcomes to help companies manage risk, reduce expense, and maximize opportunities.

Bill and Earl began looking for problems they could solve through an empirical analysis of data, and credit underwriting was a use case that was well-suited to the technique. Most credit-granting organizations had credit applications tucked away in filing cabinets (a source of consistent signal data), and most also had a reasonable handle on outcomes – i.e., who was managing credit to terms and who was in arrears or in default.

The ability to relate data known at the time of the decision (in this case, underwriting) to outcomes experienced in the future is at the heart of predictive analytics. Ascertaining the real statistical correlations between these signals and the subsequent outcomes in order to build a model is the definition of “empirical”.  No guessing, no value judgements, no subjective opinions, and no bias.

The absence of bias in empirically derived models is key to their success in fields such as credit underwriting, where our common interests have determined that there’s no room for the prejudicial treatment of applicants. They are also successful because of their accuracy and performance. There are use cases where expert, or non-empirical, models are useful – especially when the data to train an empirical model isn’t available.  But when data is available, or can be collected, an empirical approach is generally preferred.

FICO has pioneered empirically derived algorithms in multiple domain spaces, and for multiple use cases. Where data has already existed, we’ve used it. Where it doesn’t exist, we’ve set up collaborative frameworks and industry consortia to collect it.

Cybersecurity With and Without Empiricism

Cybersecurity is another great use case for predictive analytics.  Fortunately, it is also one where the data needed to empower an empirical approach actually exists. Because of public reporting requirements for breach incidents involving data loss, we actually know quite a bit about cyber outcomes. When we couple this with data we collect at internet scale regarding the condition of infrastructure and behavior of organizations, we have a sound basis for the development of empirical models.

The FICO Enterprise Security Score is an empirically derived predictor of cybersecurity risk at the enterprise level. The ESS score translates directly to the likelihood of a future, significant breach event.  The model that generates the score is based on an empirical analysis of the condition of the internet-facing assets of organizations, and the observed behaviors they exhibit in the management of those assets. Observations on both breached and un-breached organizations have been compared, and the mathematical correlations between conditions, behaviors, and subsequent cyber breach outcomes have been measured.

The resulting empirical model does a very good job of predicting breach risk. In fact, the ability of the model to discern future "goods" (un-breached firms) from "bads" (breached firms) is better than twice as good as the published results of the only competitor brave enough to post a benchmark.

So why is empirical better? One critically important reason is the impact of compound error. Expert or judgmental models translate opinions into numbers.  And while these judgments may in fact be expert, if they are not empirical they are likely to be imprecise. This introduces the challenge of imprecision at scale, or compound error. The results can get away from you quickly.

With our Enterprise Security Score, we’re looking at the condition of hundreds or thousands of internet-facing assets, in multiple categories, for each assessed organization. We’re also looking at organizational behaviors, based on how these assets are managed (or mis-managed) over time. Any time you are evaluating multiple factors in combination, the precision of each element becomes more important. Basing a model on multiple measured correlations, as we do at FICO, yields a better outcome than basing a model on multiple judgmental assessments - each contributing its imprecision to a watered-down result.

To be (empirical), or not to be? Now that we be both understand the answer, I think you know why I asked the question.

related posts