ReSci - Retention Marketing & Predictive Analytics
ReSci - Retention Marketing & Predictive Analytics
PREDICTIVE ANALYTICS
Research & Data Science Whitepaper Series
Retention Science 2601 Ocean Park Blvd. #104, Santa Monica, CA 90405 RetentionScience.com 310 598.6658
Table of Contents
Introduction Page 3
Chapter 1 What is Customer Churn? - Page 4
Chapter 2 Customer Future Value, Part 1 - Page 12
Chapter 3 Customer Future Value, Part 2 - Page 16
Chapter 4 Welcome Purchase Probability, Part 1 - Page 20
Chapter 5 Welcome Purchase Probability, Part 2 - Page 27
Conclusion Page 31
About This Whitepaper
Customer retention is an important topic for many reasons, but the most compelling
is also the most simple: your existing customers are extremely valuable. According to
research by Gartner Group, 80% of your future sales will come from 20% of your cus-
tomers. Further, a Harvard Business School report states that increasing your custom-
er retention by just 5% increases profits by 25-95%.
This information highlights something we already believe here at ReSci – that it’s cru-
cial for businesses to invest in understanding and retaining their customers. Our data
science team spends a lot of time thinking deeply about customer retention for com-
mercial businesses, so we decided to dive deeper on the retention metrics most im-
portant to your business.
Eric Doi is a data scientist at Retention Science. His goal is to improve every day, just
like gradient boosted learners. He studied Computer Science at UC San Diego and
Harvey Mudd College.
Chapter 1: What is Customer Churn?
Retention research focuses on two fundamental questions:
1. Can we objectively measure whether customers will stick
around and make a purchase?
2. Can we predict these measurements, so that retailers can take actions
to keep their customers happy, engaged, and coming back?
These are complex questions we may never solve with 100% accuracy, but
with the help of predictive metrics, we can get pretty close.
61% 21%
In Figure 2.1 above, we see what we call the retention cycle. On the left of Fig 2.1, customers are
acquired (e.g., they register on your eCommerce site), following your marketing funnel. Some of
these users then make purchases, moving them into the converted bubble, and some of them
never become customers at all. People who stop being – or never became – paying customers
have churned.
Churn can be a tricky thing to define because it happens at so many stages of the retention
cycle. Some smaller group of paying customers becomes repeat purchasers, until they stop, at
which point they’ve churned as well. Another percentage of customers only ever make one
purchase, in which case they move from the converted to the churned bubble directly. Basically,
customers can churn from any bubble in the cycle.
Each transition in this bubble needs to be measured and managed in some way, as each
transition to churn represents potentially lost revenue to your business. Through this, one can
measure and predict aspects of these transitions that are important to a business.
However, people quit being customers for any number of reasons, which makes predicting this
value difficult. This is why we use machine learning to help predict churn.
At its core, machine learning is all about computer programs that adapt themselves to the
problem at hand; for instance, machine learning could be used to identify potential VIP
customers based on attributes such as website and purchase behavior. Machine learning helps
identify latent features, obvious and unobvious attributes of customer behavior, that pick up
on the less easily measurable influences, such as location, gender, and recent order categories,
that cause people to buy or not.
Algorithms account for a large number of features, such as customer information, behavior,
order history, and website activity. They then place customers along a continuum from 0 to 1,
where 0 represents a customer definitely staying as a customer and 1 represents a customer
who will leave the business. Any number in between can be interpreted as how likely it is that
the person will quit being a customer.
Modeling (and Defining) Churn
At a deeper level, churn is modeled using an ensemble of a number of methods. We combine
classic RFM (Recency, Frequency, Monetary value) models, linear and nonlinear machine
learning classifiers to predict churners versus non-churners, and knowledge-based models
that take clues from business-specific information. For example, if your business sells diapers,
then the size of the diapers a customer orders is a great proxy for the child’s age and predicts
pretty well when that customer will churn. This, however, means very little to you if you sell tires.
Windowing users can be a useful proxy. For instance, if a user in a monthly subscription
business keeps postponing her order for six months, that user has likely churned. In order to
do this effectively, however, we spend significant time investigating different temporal windows
and their effects on different industries and businesses; for example, you probably buy diapers
more frequently than tires. This is where business-level customization is imperative: the more
customized the models, the greater and more beneficial the impact.
This brings up an interesting point: Even the notion of churn itself differs across businesses
and situations. For example, in a pure subscription model, without postponement, churn is
simply for users who have unsubscribed. This is straightforward because within this business
model, the customer’s only options are to subscribe and pay, or not. This is common for things
like cable or Internet service.
In ad-hoc purchase models, however, such as most eCommerce sites, churn is defined as
customers who stop being paying customers. Compared to the yes/no definition of churn for
pure subscription companies, this definition is trickier to pin down and can vary from business
to business.
It may involve defining churners statically, like people who haven’t purchased for a time period
that is a few standard deviations away from the average purchase time, or someone whose
time on the site without purchase far exceeds the average customer lifetime. Alternately, it
may mean making more knowledge-based approaches rooted in commonsense: Customers
who only buy apples are likely to be churners once you stop selling apples. In an even more
specific example, a media company may define churn as when people stop watching their
videos online.
In our research, we’ve found that ad-hoc purchase models have also been shown to predict
when customers will unsubscribe from subscription-only businesses. This is significant because
it demonstrates the models’ accuracy in picking up signals of purchasing behavior, regardless
of whether those purchases are made in a timely, prescribed fashion through subscription
companies, or made on the fly and by demand via traditional ecommerce sites.
Below is a validation report for a churn model that combines both a classic RFM churn score
and a random forest classifier, a machine learning method that learns different sets of rules
that determine churners and non-churners. It shows the results of training a model on data up
until 1/1/2015 and then testing that model on data through 7/1/2015. The results demonstrate
how well the model performed on that day.
Churn Model Properties
Model Name Ensemble (Classic RFM Chum Score. Random Forest Classifier)
People who Did Not Buy (AKA churned) who were predicted as Non Buyers 432,616
People who Did Not Buy (AKA churned) who were predicted as Buyers 153,834
Non-buyers Identification Accuracy 73.7696
People who Did Buy who were predicted as Buyers 176,489
People who Did Buy who were predicted as Non Buyers 17,529
Buyers Identification Accuracy 90.96%
As you can see in the report above, customers who churned (e.g. non-buyers) are identified
with almost 74% accuracy, and we can predict customers that will not churn (e.g. buyers)
with 90% accuracy.
Finally, Figure 2.3 below shows the Root Mean Square Error of the probabilities we generate for
each customer belonging to each class. Essentially, we want to know, on average, how well
(or not) we did at classifying each churner using the probability we assign that the customer
will churn.
Figure 2.3: Root Mean Square Error of Churn Class Predicted Probabilities
The Bottom Line: Why Churn Matters
Churn matters because not only does it predict when customers will stop purchasing, it can
target them to keep them happy, it also gives you deep insight into the types of customers that
represent your greatest champions and the types of customers that are your biggest distractions.
To that end, it is interesting to analyze which aspects of customers tend to be most influential
in causing churn. Customer churn happens for qualitative reasons that are difficult to quantify
even for companies with rich troves of customer data. Although we caution that correlation
doesn’t necessarily mean causation, it’s clear to see that correlating these deep qualitative
reasons to some quantitative metrics can yield some good modeling results – and some good
insights for your business.
Purchase Recency
Incentives & Frequency Of
Purchase
Past Purchase
Behavior Sentiment
Demographics
(Gender, Age, Etc.)
Census Data
Web/app
Behavior
As a quick aside, the notion of Customer Lifetime Value (CLV, sometimes called Lifetime Value
(LTV)) is a standard metric in SaaS and eCommerce businesses. CLV is explicitly separated into
two parts: the deterministic order history, which is based on order histories, and the predictive,
future value that the customer will bring to the business, which is the CFV.
CFV is a powerful prediction tool for a number of use cases. Fundamentally, it informs which cus-
tomers will be worth more in the future and, therefore, are worth nurturing. It also influences the
types of discounts you may present to customers to keep them happy. A low CFV customer just
might not be worth that 10% coupon, because even if it brings him or her back into the fold, that
customer won’t spend enough to justify the offer. From a retention perspective, CFV gives a com-
pany the insights required to understand the value of their retained customers, and what impact
that has on their revenue. For instance, it allows a company to quantify how their efforts to reduce
churn impact their expected revenue.
As with churn, CFV can provide powerful insights into your business at both the individual cus-
tomer level and at the holistic level. For instance, consider a plot of CFV for your whole custom-
er base (the CFV distribution). There are clearly whole segments that will contribute significant
amounts to the revenue, and many customers who will not.
What’s more, by grouping users by CFV, you can use a statistical approach based on similarity,
bucketing users into low, medium, and high CFV groups. This allows marketers to target specif-
ic groups with targeted campaigns; for instance, you might send VIP invitations to the high CFV
segment to ensure their retention.
Consider Figure 3.1 below, which plots that user’s contribution to the total CFV. The x-axis shows
the percentage of users, and the y-axis the percentage of the total CFV. From the graph, 20% of
the users are expected to contribute more than 80% to the future revenues, based on CFV. That’s
an astounding finding for a business, though not a total surprise if you follow Pareto’s obser-
vations. This highlights the necessity for retention where both keeping those top customers is
crucial and re-engaging the lower CFV customers can also dramatically increase revenues -- for
instance, by converting lower CFV customers into repeat purchasers.
PL = 1 – Churn Score
(for subscription-only businesses)
Now that we know how likely it is for someone to make a purchase, we have half of our inputs for
the CFV. Next, we need to figure out how much someone is likely to spend. For that, we introduce
the Average Order Value (AOV). In the case of this subscription example, assume that a customer
stays subscribed for 6 months and that there is only one subscription option that costs $10.00
each month. Then one simple model is to assume the AOV is simply $10.
Predicting the average amount a user will spend is also a rich and interesting problem. For in-
stance, you could assume that someone’s past purchase behavior is enough information to pre-
dict how much that person might spend in the future. Or you could assume that all of the cus-
tomers are more or less the same, and use global information about your entire customer base to
predict this value.
Now that we have the AOV and the PL, we define CFV as:
The time window for CFV is the time amount for the CFV expectation, such as 3 months or 6
months. In our subscription model, this becomes:
Earlier we assumed one purchase every month, so the time window for CFV is identical to the
number of the expected future purchases.
Chapter 3: Customer Future Value,
Part 2
In this whitepaper, we’re diving deep into the metrics and methods that are essential for data-
driven retention marketing. Customer Future Value is one of the predictive metrics that help
marketers determine which customers to nurture based on their future impact to a business.
Applications of CFV
CFV can be useful in a number of ways; for instance, it can cohort your users into those who are
projected to be “big spenders” and those who are not. It then allows the company to target each
cohort differently.
It also allows for deep and actionable audience segmentation at a more granular level. For in-
stance, we can break CFV down by state, as shown in the figure below. In this figure, the state for
each customer is shown on the x-axis, the left y-axis shows the number of customers from that
state (represented as the dotted line), and the right y-axis shows the average CFV for customers
from that state (represented by the blue bar).
In this example, it’s very clear that not only do the most number of customers come from Califor-
nia, but those customers are also predicted to be the biggest spenders, by far, as compared to the
rest of the states. Rounding out the top states are New York, Texas, Illinois, and Florida. Therefore,
when marketers need to decide where to place marketing dollars or to create specific content, it
might be best to focus on those areas.
We could also compare CFV by registration source for the customers, as shown in the figure
below. It is just like the figure above, except the x-axis represents the registration source for the
customers. Using a chart like this allows marketers to pinpoint their ad spend, allocating more
resources to those registration sources that produce both a large number of customers and those
who will spend more money.
The validation report shows the actual revenue during that period, shown as “Site Actual CFV,”
and our prediction “Site Predicted CFV,” which is what we predicted on 1/1/2015.
The Site Level Mean Absolute Accuracy shows, in percentage, how far off our prediction of the
revenue was from the actual revenue. In this particular example, we correctly predicted the rev-
enue within 92% -- we were only off by $292K -- when predicting the specific company’s future
revenue.
We also present the User Level Mean Absolute Error CFV, which shows how well we predict the
future value (e.g., future spend) of each individual user. It’s much more difficult to predict each
individual’s value than the company’s as a whole, on the site level. In this particular example, we
were, on average, off by $1.75 per customer.
CFV is a key metric for predictive analytics. It creates actionable information from customer data
that really helps move the needle on your business.
Chapter 4: Welcome Purchase
Probability, Part 1
As the saying goes, you don’t get a second chance to make a first impression. This is true for
customers too, and it can have a big impact on a business. At Retention Science, we address this
problem by predicting, after a customer first signs up, whether he or she will actually turn into a
paying customer. We define this as a customer’s Welcome Purchase Probability. WPP predicts
whether or not someone will become a purchasing customer based solely on signup data. Market-
ers can then use this prediction to create campaigns that will resonate with customers more likely
to purchase. In many ways, WPP is the start of the retention cycle because, without the ability to
get paying customers, there is no one to retain in the first place.
Across a number of different businesses, 62% of customers immediately churn. That is, more than
half of the customers who sign up or register never end up making a purchase. The figure below
shows the immediate churn rate for companies of various sizes. The x-axis shows the company
sizes, ranging from tens of thousands of customers to more than 16 million customers. The imme-
diate churn rate is shown above each bar, as a percentage of the total customers. For instance,
the company with 3.4 million customers has an immediate churn rate of almost 84%, representing
some 2.5 million of their customers. The immediate churn rate is shown above each bar, as a per-
centage of the total customers.
Immediate churn rate for various companies
On average, more than 60% of eCommerce customers who register with a website fail to make a
purchase. That is a significant waste of customer acquisition dollars and means companies have
already fallen behind in terms of retention, since they are losing so many customers from the
start. By using WPP models, you can effectively address this challenge.
Another important aspect of WPP is that it can produce profiles of likely and unlikely purchas-
ers. For instance, as we explain below, there are certain factors that indicate with strong support
that the user will probably purchase, or not. By investigating these features, marketers can build
powerful acquisition schemes, tailored to those most likely to buy. For instance, if WPP identifies
college-aged females in the western United States as the most likely purchasers, then you can
specifically target those users with advertising and content marketing.
Modeling Approach
How is this prediction actually done? At ReSci, our approach is entirely data driven. We create fea-
tures based on existing individual customers, and train a classification model -- we’ve found that
ensemble methods work well -- to use these features and predict whether someone purchased
something in the past or not. These features span a large range of user description information,
such as whether a user registered with the business via Facebook or where the customer is locat-
ed. In some cases, we have even more detailed specific customer information, which could po-
tentially be quite discriminative. For one eCommerce company, we proved that a particular color
preference is a strong predictor of whether someone would convert or not.
Our models yield scores which are aligned with the posterior probability distribution, which allows
us to predict, for a future user, if he or she will purchase based on the combination of features that
represents that customer. While this method is useful, we found the most utility by simply buck-
eting users into two cases: likely to purchase and not likely to purchase. To create these buckets,
we simply find an empirical point in the distribution of the purchase probabilities, and split cus-
tomers into purchase or not purchase buckets using this criterion.
By making this approach completely data-driven (e.g., predicting based on the past data), we can
update the model every day, as new customers register with our clients and become purchasers.
In this way, we can reflect how the WPP is changing based on specific marketing campaigns.
To evaluate the performance of our WPP models, we track what each user actually ends up doing
in the following months after our models make their predictions. We find that we do quite well in
differentiating between good and bad customers, and this identification can make a big difference
for clients. For example, for one client, when we compared the 10% most promising and the 10%
least promising users (by the model’s scoring), we found that the top 10% spent almost 300%
more and converted 40% more often than the bottom 10%.
Examples of WPP in Action
It’s interesting to note the specific drivers of WPP. That is, what information suggests that some-
one will become a purchaser? As we mentioned, this problem is challenging because there is a
limited amount of information when a user signs up; however, we’ve found that, for some busi-
nesses, there is still significant predictive power in that limited information. For example, one
common input is the year the customer was born, which allows you to estimate the customer’s
age. This turns out to be an interesting feature, as we can see by examining two different clients,
which we will call Client A and Client B.
We can examine the learned weights in the model to get a sense of how age impacts the end
prediction of purchase likelihood. We can interpret these roughly as probabilities for convenience.
The plot in the figure below shows WPP’s weight plotted against the age of the registered cus-
tomers, in years.
The probabilistic score impact (y-axis) against the age (shown on the x-axis)
for Client A and Client B
For Client A (orange), this information is slightly predictive: If the users are between the ages of
15-30, they are less likely to purchase than if they are older (30+). However, the results are noisy.
On the other hand, for Client B (blue), there is a consistent relationship between age and pur-
chase likelihood, with the best users being in the 60-80 range.
There are a few other interesting points here. One is related to the distribution of feature values.
Strangely, there seem to be some very elderly users. We can get some more context for this by
looking at the distribution of user signups across age; the numbers have been altered but not the
distribution:
As the graph shows, in both distributions, there is a rather suspicious spike in the number of users
in the tail end, older than 100. While this could be a data translation error, more likely this means
users are lying. These extreme ages are both related to the earliest years users can choose when
they register (around 1900). Users might lie for a number of reasons, such as privacy concerns or
to get past a legal age limit. However, despite this data issue, WPP modeling revealed that age is a
strong predictor of purchase probability, even without other features.
The most predictive features will likely be domain-specific. In the case of Client A, color choice
is central to their most popular products, and it turns out this is the most predictive feature for
purchasers. This might not be surprising, given the importance of color to their products, but it’s
striking that the color that is most predictive is more than three times as indicative of a potential
purchasing customer than the color with the lowest indication. That, then, gives a very strong sig-
nal to use in early email communications. The figure below shows the various product colors and
their impact on the potential purchase. The colors are ordered from most indicative (top of the
figure) to lowest (at the bottom).
The impact color choice has on whether a registered user will purchase
Evaluating Performance
Of course, the insights gained from a retrospective look at data is one thing. How do the models
actually perform in practice?
Using the WPP model, we get a fine-grained ranking of all users according to their likelihood to
convert. To illustrate, for a gambling company, we scored about 8000 users over Q1 and tracked
their transactions over 6 months.
If we compare our model’s predicted top 10% users with the predicted bottom 10%, we find that
the top 10% spent almost 300% more and converted 40% more often.
If we go further and compare the model’s predicted top 1% of users with the predicted bottom 1%,
the results are even more drastic: the top 1% spend over 700% as much.
WE’LL TEACH YOUR TEAM HOW TO INCREASE YOUR REVENUE
Retention Science 2601 Ocean Park Blvd. #104, Santa Monica, CA 90405 RetentionScience.com 310 598.6658