Today’s Deep Dive: Innovative Unsupervised Learning in AI

This two-part blog unpacks the mysteries of two very different AI techniques: supervised and unsupervised learning.

by Scott Zoldi

Chief Analytics Officer

January 22, 2018

Categorically, artificial intelligence (AI) can appear be an odd juxtaposition of order and disorder — we direct the AI with algorithms, yet the system produces new insights seemingly magically. This two-part blog unpacks the mysteries of two very different AI techniques: supervised and unsupervised learning.

Supervised Learning: The Workhorse of AI

Most of the well-known applications of machine learning and computational AI involve supervised learning. The modeler amasses a vast set of existing data (e.g., financial transactions, internet photographs, or the texts of tweets) and a base-level “ground truth” outcome that is already known, perhaps in retrospect or by expensive human investigation.

Equipped with any number of computational algorithms, the scientist becomes the “supervisor” whose code trains the model to reproduce, in the lab, the known outcomes with a low probability of error. The models are then deployed to live a happy life scoring credit risk and fraud likelihood, finding pictures of Chihuahuas and muffins, or flagging insulting tweets. Technically, each model computes a probabilistically weighted predicted outcome that we believe to be like those outcomes from the training examples. The state of the art for supervised learning is now well established; you can choose from dozens of comprehensive predictive analytics and neural network packages.

Unsupervised Learning: Inferences in the Absence of Outcomes

But what if there is no set of “true outcomes” known, or the ones at hand are restricted in quality or quantity? What can machine learning do for us then? This is the domain of the far trickier unsupervised learning, which draws inferences in the absence of outcomes.

Good unsupervised learning requires more care, judgement and experience than supervised, because there is no clear, mathematically representable goal for the computer to blindly optimize to without understanding the underlying domain.

The Challenge of Outlier Detection

A central task within unsupervised modeling is outlier detection: Which examples are most unlike most of their peers? Outlier detection and transaction fraud scoring provide an easy illustration:

Which customers request money transfers with patterns substantially different from most of their peers?
Which medical providers bill insurance for sets of claims most unlike their peers?
Which transactions on an individual payment card are most different from a customer’s usual behaviors?

The solution pattern for these tasks is a problem- and domain-specific transformation of the raw data into a quantitative vector space of features — up to now, exactly in line with supervised predictive modeling. This is followed by a more generic mathematical construction to yield a numerical score of the “degree of outlier-ness,” in the absence of ground-truth training outcomes.

Because there are far fewer principles, and less didactic instruction and widely available software compared to classic supervised modeling, there are even more analytic “gotchas” requiring deep analytic scientist experience and judgement. Difficulties and considerations in outlier detection include:

The need to define a metric or distance. Many techniques require defining a “metric” or “distance” function between pairs of observations. One problem is that the individual components of this feature vector have qualitatively different meanings – how can one balance adding or subtracting apples and oranges, and kumquats and kangaroos?Often this is done ad-hoc or, unfortunately, without intention as the underlying algorithm method assumes a metric. What should be done in the real-life scenario of a combination of quantitative and categorical features? Supervised modeling can often be blissfully ignorant of this problem, since the quantitative optimization with known targets tends to scale and transform each feature automatically, to the degree that it contributes predictive value.
In an unsupervised context, an explicit metric will have major influence on the scoring of outliers; this is imposed by the analytic scientist. Additionally, in a high-dimensional space, our intuitions about the properties of neighborhoods and neighborliness derived from our three-dimensional physical experience are very misleading: A randomly selected point in the training dataset is often not much further away than a point’s nearest neighbors. At FICO, we believe outlier statistics derived under these intuitions ought to be approached with caution.

Computational burdens on scoring. How expensive is it, in terms of computation and in memory, to score new observations with the outlier model? Do any complex data structures need to be created for scoring? Do we need to retain a significant fraction (or all) of the training data set to score a new observation in production?
Calibration and interpretation of score. If we have a number representing “degree of outlierness,” what does it mean? Does it have a well-behaved, approximately continuous score distribution under the natural data set, or is the distribution irregular, with significant delta functions or gaps? What happens when the dimension of the training set changes, i.e. are there major systematic trends?
Feature cross-correlation. This is a subtle yet critical problem that gets little attention in the field. Frequently, the underlying features are designed to address a particular of the problem domain, but often there are a significant number of related, and therefore correlated, features covering some conceptual axes of the problem, but other aspects of behavior are represented by only a few features each. The effect on outlier scores may be severe. Can one balance this automatically, in a principled manner?

Requirements for Commercial-Grade Outlier Detection

Beyond clear technical issues, there are some higher-level properties that FICO scientists believe a state-of-the-art, commercial solution must address.

Qualitative diversity of detected outlier behavior. Commonly, the quantitatively highest-ranking observations under some outlier statistic may all be a result of one particular “type” of outlier, for instance, a single modus operandi of fraud or abuse. However, the subject matter expert knows there are substantial varieties of anomalous behavior possible. A superior approach would generate a qualitative diversity of outlier cases. This is a tough problem for the less sophisticated practitioner and virtually unaddressed in public literature, yet is very important in commercial application. Fully compensating for generalized feature cross-correlation in a principled algorithm goes a long way toward fulfilling this goal.
Qualitative versus quantitative outlierness: discerning “unknown unknowns.” Can we distinguish outliers that are a significant “quantitative” exaggeration of normal behavior from ones that are fundamentally distinct, in a qualitative sense, from the norm? Both ought to score high on an outlier statistic, but we want the second to score even higher.

Check back for Part 2 of this blog, where I’ll tell you more about FICO’s latest innovations in outlier detection, and how we are applying these unsupervised learning breakthroughs in our fraud and compliance solutions.

Follow me on Twitter @ScottZoldi.

Scott Zoldi

Dr. Scott Zoldi is chief analytics officer at FICO, responsible for artificial intelligence (AI) and analytic innovation across FICO's product and technology solutions. Dr. Scott Zoldi has been listed as an inventor on 122 AI & software patents, in collaboration with other data and analytic scientists, and he is also named on an additional 40 patent applications in process. Scott is an industry leader in the responsible use of AI, Generative AI (GenAI), and Agentic AI, as well as an outspoken proponent of AI governance and regulation. His groundbreaking work in focused language models (FLMs) for GenAI and a patented use of blockchain technology for AI model development governance has helped propel Scott to AI visionary status. In support of the FICO Focused Foundation model, Scott recently accepted the Best AI Solution award at the 2026 Banking Tech Awards USA, as well as was honored in the Small Language Models Strategic Planning category at the BIG 2026 AI Excellence Awards. His recent awards include being a Finnovate 2026 finalist for Innovator of the Year, receiving the Constellation Research Award AI150, Tech Leadership Award from Banking Tech Awards, Tech Influencer Highly Commendable Award from DataIQ Data & AI Awards, San Diego Business Journal - Leaders of Influence in Technology (2025); Tech Leadership - Software & Services Provider from Fintech Futures, MachineCon AI100 Award, and Innovator Award from American Banker (2024). An enthusiastic member of the southern California tech community, Scott serves on the Boards of Directors of Software San Diego and the San Diego Cyber Center of Excellence. He received his Ph.D. degree in theoretical and computational physics from Duke University, and his work has been published in The Harvard Business Review and numerous scientific journals.

When not at his office or on a plane, Scott can often be found in his Ford Bronco, exploring the desert around San Diego with his family. To hear more of his views follow Scott on LinkedIn and BlueSky @ScottZoldi.

See all posts

Blog home

Take the next step

Connect with FICO for answers to all your product and solution questions. Interested in becoming a business partner? Contact us to learn more. We look forward to hearing from you.

Today’s Deep Dive: Innovative Unsupervised Learning in AI

This two-part blog unpacks the mysteries of two very different AI techniques: supervised and unsupervised learning.

Supervised Learning: The Workhorse of AI

Unsupervised Learning: Inferences in the Absence of Outcomes

The Challenge of Outlier Detection

Requirements for Commercial-Grade Outlier Detection

Scott Zoldi

Has the Reporting of Rental Data to the Credit Reporting Agencies (CRAs) Increased?

Average U.S. FICO® Score at 716, Indicating Improvement in Consumer Credit Behaviors Despite Pandemic

FICO Statement on FHFA and FHA Updates to Credit Score Modernization

Take the next step

Supervised Learning: The Workhorse of AI

Unsupervised Learning: Inferences in the Absence of Outcomes

The Challenge of Outlier Detection

Requirements for Commercial-Grade Outlier Detection

Scott Zoldi

Popular Posts

Has the Reporting of Rental Data to the Credit Reporting Agencies (CRAs) Increased?

Average U.S. FICO® Score at 716, Indicating Improvement in Consumer Credit Behaviors Despite Pandemic

FICO Statement on FHFA and FHA Updates to Credit Score Modernization

Take the next step