info This content is available in English only.
close

How to Use Alternative Data in Credit Risk Analytics

Here is useful information on how to assess alternative data and combine it with so-called traditional data to improve credit risk analytics

When it comes to using alternative data in credit risk assessments, the field has really opened up over the last few years. Alternative data is a hot topic, in part because of the data explosion of the last few years, and in part because of the drive in lending for financial inclusion. Here is useful information on how to assess and analyze alternative data and combine it with so-called traditional data to improve credit risk analytics.

Multiple Types of Alternative Data Used in Credit Risk Analytics

What is alternative data? In credit granting, it generally refers to any data that is not directly related to a consumer’s credit behavior. Traditional data usually means data from a credit bureau, a credit application or a lender’s own files on an existing customer - this is the data most commonly used in credit scoring models. Alternative data is everything else — a variety of data sources and techniques.

There are an estimated 3 billion adults worldwide who don’t have credit and so don’t have credit records. Opening up that market is a priority for lenders. And while many of these consumers are in developing markets with nascent credit infrastructures, there are so-called “credit invisibles” in the most mature credit markets, people who have no credit and are unknown to the credit bureaus.

With this in mind, let’s look at a few sources of alternative data, and how useful these datasets are for lenders’ credit decisions.

Alternative Credit Data Sources

  • Transaction Data. This is typically data on how customers use their credit or debit cards. It may not seem “alternative” — most lenders have this data already, often manipulated into monthly summaries — but it’s not often mined to extract the maximum predictive value. It can be used to generate a wide range of predictive characteristics such as Ratios of Cash to Total Spend in last X week(s) or Ratios of Spend in last X week(s) to last Y week(s) and even characteristics based on the number, frequency and value of transactions at different retailer types. Processing this dataset can be time-consuming, but the data itself is generally clean.
  • Telecom / Utility / Rental Data. This dataset is basically credit history data, but it’s alternative because it doesn’t actually appear in most credit reports. FICO has mined this data for the FICO® Score XD in the United States.
  • Social Media Profile Data. Mining Facebook, LinkedIn, Twitter, Instagram, Snapchat or other social media sites is possible, but few lenders would want to brave the regulatory hurdles of being the first mover. Although it would be possible to derive the value not from what people say on these social media channels but from metadata – for example, the number of posts and their frequency or the size of their social graph — this would still likely raise privacy issues. In addition, despite what some enterprising fintechs might say, the value of this dataset would be far lower than the value of data with a stronger credit connection. It is also possible for a consumer to manipulate this data.
  • Clickstream Data. How a consumer or applicant moves through your website, where they click and how long they take on a page can be predictive.
  • Audio and Text Data. This data takes the form of information found on credit applications, in recorded customer service or collections calls. It can complement “thin” credit report files and is already proving its worth in collections.
  • Social Network Analysis. New technology enables us to map a consumer’s network in two important ways. First, this technology can be used to identify all the files and accounts for a single consumer, even if the files have slightly different names or different addresses. This gives you a better understanding of the consumer and their risk. Second, we can identify the individual’s connections with other people, such as people in their household. When evaluating a new credit applicant with no or little financial history, the credit ratings of the applicant’s network can provide useful information. However, this dataset is not going to meet the regulatory tests in all markets.
  • Survey / Questionnaire Data. An innovative new technique, this analysis allows lenders to  rate the credit risk of someone with little or no credit history through psychometrics. 

How Much Value?

FICO research has shown that these data sources do add predictive value on margin to risk models based on traditional data. The amount of predictive value outlined in the table below should be viewed as relative indicators, not absolute values, as the additional value of the data source is based on many parameters such as predictive power of existing models, strength of the customer relationship with the lender, etc.

Chart showing value of multiple kinds of alternative data

Please note that the Traditional Models used as the baseline were application models, not credit bureau score models (such as the FICO® Score).

The chart below shows the result of one project FICO did for a personal lending origination portfolio. The traditional credit characteristics captured more value than the alternative data characteristics (with the alternative data capturing about 60% of the predictive power), and there was a high degree of overlap between the two. However, by combining the traditional and alternative data characteristics (and understanding the overlap so as not to over-weigh certain variables’ contribution), we were able to produce a more powerful credit risk model.

Lift curve for FICO study

Machine Learning and Explainability in Credit Risk Models

It’s impossible to talk about alternative data without talking about different analytic technologies and machine learning, such as neural networks, random forests and stochastic gradient boosting. With large, unstructured data sets, the smart use of these technologies can identify data patterns that relate to credit risk and make the model development process more manageable.

However, as is true with AI in general, data scientists play an important role. They need to check the accuracy of the output, make sure the model doesn’t overfit the data, make sure the model provides stable output, and ensure that the patterns discovered are strong, relevant and explainable.

Explainability is a challenge when dealing with AI and machine learning. Lenders need to explain how consumers are scored – certainly to regulators, and often to consumers themselves. FICO uses a technology that takes the patterns identified in AI, machine learning and other techniques and turns them into scorecards that are easy to understand and implement, and produce the similar uplifts in predictive power as machine learning models. For more information on these techniques, see the blog post by FICO Chief Analytics Officer Scott Zoldi on How to Build Credit Risk Models Using AI and Machine Learning.

How FICO Can Help You Use Alternative Data in Credit Risk Analytics

This is an update of a post first published in August 2017.

chevron_left Blog home
RELATED POSTS

Take the next step

Connect with FICO for answers to all your product and solution questions. Interested in becoming a business partner? Contact us to learn more. We look forward to hearing from you.