Churn Analysis
The purpose of churn analysis is to analyse and measure the rate at which customers are leaving your product. Depending on your product, the behavior that you want to analyse can take many shapes.
For example churn can be defined as:
- a customer terminates their contract with you
- a user deletes their account with you
- a user uninstall your app from their devices
- a customer does not return to your website or location
To analyse churn you first need to define and capture the behavior that feels most appropriate for your product. You then need to decide on the features that you feel would be most appropriate to analyse to understand the churn.
Once you have collected the data there are a few ways that you can go about analysing it to better understand churn.
- Predicting the probability of churn for customers that are still active
- Understanding what drives churn by explaining the prediction
- Predicting potential losses from churn in the short term or by future periods
- Understanding what drives churn by different data cuts, like country or demographics
- Predicting potential losses by different data cuts, like country or demographics
In the below example we will use a demo data set to explain and describe these methods in more detail.
Example Data
In the following examples we will use data from the Kaggle Dataset Bank Customer Churn Dataset , which describes the customers of a bank, where some have churned and others are still active users.
For each user we have data on demographics and some high level bank details.
You can download the raw dataset that fits with below queries directly here.
Predicting churn
In our first examples we will predict the probability of churn for all points in the data set. We will then sort the output by the likelihood of churn.
The query predicts the probability of churn by learning from already churned customers; if customers haven’t churned yet, we can predict how likely they are to churn.
We input all columns from the data set Bank_Customer_Churn_Prediction
but filter out the customer_id
column from
the prediction, using the ignore
flag, since we do not believe it will contribute to the prediction
but only add noise.
SELECT Bank_Customer_Churn_Prediction.*, prediction
FROM Bank_Customer_Churn_Prediction
PREDICT(churn, ignore=customer_id) ORDER BY prediction DESC
The output column prediction
is an output of the PREDICT method, the value being the predicted outcome;
1 if churn is predicted and 0 otherwise. The column probability
is an output of the PREDICT method,
giving the probability of the predicted outcome being positive (i.e. churn = 1).
Understanding churn
The next step of the analysis is to understand what drives the prediction to better understand what can be done to prevent churn or what signs to look out for.
To do this we extend the previous analysis by wrapping the prediction with an EXPLAIN
.
For each input column this will output a value between 0 and 1.0 indicating how influential that column is to the
prediction of churn.
It uses all columns from the input dataset *
, except 'customer_id', to learn about 'churn'
and returns how important each column is to the prediction of churn.
SELECT * FROM Bank_Customer_Churn_Prediction
EXPLAIN(PREDICT(churn, ignore=customer_id))
In this example we find that age
is the main driver of churn.
Most likely high value customers to churn
Now that we understand how to predict churn and what drives it, let's look at who is most likely to churn that would significantly impact our business.
To do this we filter our prediction, firstly by who has more than 85% likely to churn and secondly by who has more than 100k in the account and hasn't churned yet.
Note that the inner SELECT
statement uses all the data points, churned or not,
to learn what and predict the probability of churn.
We then in the outer statement filter out the accounts that are yet to churn and has a high balance.
In total, the query applies 3 filters to the previous "Predict churn" query:
- Filter 1 - High probability of churn (>75%);
- Filter 2 - High value customer with 100k+ in their account;
- Filter 3 - Customers that haven’t yet churned
SELECT customer_id, balance FROM
(SELECT * FROM Bank_Customer_Churn_Prediction PREDICT(churn, ignore=customer_id)
WHERE probability > 0.85 and prediction=1)
WHERE balance > 100000 and churn=0
Expected loss of balance per customer
Now that we understand how to predict churn we can define the notion of "expected loss" for a customer as the probability of them churning times their balance.
This gives an indication of the balance at risk for all accounts, not just high value or high churn probability ones.
We compute this by an estimate of the likely loss of balance of customers who haven't yet churned
but are predicted to churn.
We use balance
times probability
to calculate expected loss, rename the output to match
and order by size of the expected loss.
SELECT customer_id, balance*probability AS expected_loss FROM
(SELECT * FROM Bank_Customer_Churn_Prediction PREDICT(churn, ignore=customer_id))
WHERE churn=0 ORDER BY expected_loss DESC
Expected average loss of balance per country
Once we have defined the notion of "expected loss" on an account basis, we can aggregate these across accounts and get different views on the total or average expected loss by subsets of the data. This gives us an insight into which countries are most exposed.
In this case we compute the average expected loss per account by country. By averaging the expected loss we can compare across countries more easily. We could also have summed and gotten a view of the total expected loss for each country.
SELECT country, AVG(balance*probability) AS expected_loss FROM
(SELECT * FROM Bank_Customer_Churn_Prediction PREDICT(churn, ignore=customer_id))
WHERE churn=0 GROUP BY country