Skip to main content
Get the Balance Right: Machine Learning vs. Human Expertise

By Andrew Jennings

Recently on the Banking Analytics Blog, I outlined the imperatives that can be practiced by every company, irrespective of size or analytic sophistication. One key imperative is to get the balance right between machine learning and human expertise.

Consider the development of a predictive scorecard; we can often boost performance by creating segmented scorecards that go beyond analyzing customer characteristics in an additive manner to examining complex interactions between them. But since there are an immense number of possible segmentation schemes, it’s unlikely the best scheme, producing optimal performance, will be found in a timely manner.

Machine learning speeds up the search by crunching through Big Data to test large numbers of characteristic interactions. For example, Tree Ensemble Models (TEMs), a type of machine learning algorithm, have proved helpful in finding segmentation schemes that capture more complex customer behavior patterns with less danger of overfitting to “noise” (relationships specific to development data and thus not generally reliable for making predictions on production data). In a laboratory study for a Chinese insurance company, we found that a TEM was nearly twice as effective at predicting auto insurance fraud as a traditionally developed model.

To make these insights useful in operations, however, requires human involvement. Analytic expertise is essential to compensate for biases and “holes” in the development data and for bridging the gap to production data, which will be different and will vary often rapidly overtime. Moreover, instead of deploying the TEM itself – essentially a “black box” of hundreds of decision trees, difficult to understand, deploy and explain to regulators – FICO has innovated a method of transmuting TEM insights into a segmented scorecard. This technique, which we’re now using successfully for clients outside of the lab, has the advantage of greater transparency and more straightforward implementation. In addition, scorecards allow business experts to incorporate domain knowledge into predictions and customer treatments.

We can see the need for a similar partnership in the analysis of text and voice records. Businesses have collected a lot of this information – along with other forms of unstructured/semi-structured data, it now accounts for some 90 percent of consumer data – but have so far analyzed almost none of it.

Our research demonstrates that text analysis, while having some predictive value of its own, is extremely effective when folded into structured-data models. Initial steps include data cleansing and standardization (analyzing collector notes, we found 90 percent of the content consisted of abbreviations, codes, misspellings and garbled text). We next extract text features and transform them into numerically based customer characteristics. A wide range of techniques can be used to accomplish this, from simple keyword flagging and indexing to more sophisticated methods like our patented context vector modeling, which examines clusters of words in order to understand meaning, and LDA, a patent-pending topic analysis method.

Because the number of characteristic candidates in text/voice can be quite huge, automation is very helpful. With Big Data technologies, we will have an increasing number of text mining methods available for crunching though unstructured data to find patterns. We can expect these to become quite commonly used.

The real opportunity for competitive advantage rests in human involvement. Specifically, in how experts who understand the business problem:

  • Apply automation to extract the most useful characteristics.
  • Incorporate text-derived numeric characteristics with the more traditional data in predictive models.
  • Make the most of text characteristics once they are in numerical form.

More broadly, there’s potential to use text analytics (particularly methods examining context) to gain insight into customer attitudes and intentions, which have been difficult to analyze from structured data alone.

We will continue this conversation next week on the Banking Analytics Blog. Additional information on the imperatives of next-generation learning is available in my recent Insights white paper: "When
Is Big Data the Way to Customer Centricity?
" (registration required).

related posts