In my last post about artificial intelligence, I discussed analytic technologies that are making smart machines smarter. But even the smartest machines lack fundamental human characteristics that enable us to solve problems. One of these is curiosity — surely a computer can’t replicate that?
Welcome to the world of neuro-dynamic programming.
This is an analytic technique based on reinforcement learning, which mirrors the way we learn complex tasks that result in long-term positive results. Neuro-dynamic programming enables smart machines to think ahead.
To illustrate this, let’s say we want to increase customer lifetime value and business cash flow. Each time the business takes an action, the customer reacts, the business responds to the new state of the relationship with another action, the customer reacts, and so on.
At any point in the sequence, the current state of the customer relationship is the result not only of the just-taken action, but also of the string of previous actions. Just as in a chess game, where a checkmate could be rooted 10 moves back—or even in the first move—the loss of a valuable customer may have started with actions taken months ago.
Predicting the Outcome of a Sequence of Actions and Reactions The figure depicts how analytics can learn about long-term effects by assigning credits for successful outcomes and penalties for unsuccessful ones. Although the action immediately before the outcome may receive a larger share of the credits or penalties, reinforcement learning distributes some amount of them across the entire sequence of actions.
Learning to Make Better Decisions with a Long-Term Perspective
During training with historical data, the model learns to associate value (total discounted rewards and penalties) with a customer state and each of the potential actions the business can take at that point. When presented with new data on a customer indicating a given state, it’s able to predict the long-term value of taking one action over another.
To improve business actions at a fast pace, analytics must have a way to learn causal relationships (this change in action A causes outcome Y to change in this specific way) from data. To do so, the algorithm performs experiments, varying the actions taken on test samples of customers (an alternate process for improvement is adaptive control).
Here’s where curiosity comes in. The goal of the algorithm is to learn faster, which means testing new actions to measure the response and adjust its concept of the world. The algorithm is therefore built to try out new decision cuts again and again. Like a curious person, it is never satisfied with the data it has — it is always actively asking “what if…”
Neuro-dynamic programming could improve many areas of operations—originations, customer management, marketing and collections—where we currently use decision modeling and optimization to pinpoint the best next action. Adding AI-based techniques, we should be able to look beyond the immediate consequences of the next action and to reason about sequences of actions and reactions leading to long-term results. Augmented with synthetic curiosity, advancing AI techniques are starting to offer some of the wisdom of human experience while retaining objective data-driven quantification.
For more information, see the Insights paper on “Does AI + Big Data = Business Gain?”