# Churn Analysis

The purpose of churn analysis is to analyse and measure the rate at which customers are leaving your product. Depending on your product, the behavior that you want to analyse can take many shapes.

For example churn can be defined as:

• a customer terminates their contract with you
• a user deletes their account with you
• a user uninstall your app from their devices

To analyse churn you first need to define and capture the behavior that feels most appropriate for your product. You then need to decide on the features that you feel would be most appropriate to analyse to understand the churn.

Once you have collected the data there are a few ways that you can go about analysing it to better understand churn.

• Predicting the probability of churn for customers that are still active
• Understanding what drives churn by explaining the prediction
• Predicting potential losses from churn in the short term or by future periods
• Understanding what drives churn by different data cuts, like country or demographics
• Predicting potential losses by different data cuts, like country or demographics

In the below example we will use a demo data set to explain and describe these methods in more detail.

## Example Data​

In the following examples we will use data from the Kaggle Dataset Bank Customer Churn Dataset , which describes the customers of a bank, where some have churned and others are still active users.

For each user we have data on demographics and some high level bank details.

You can download the raw dataset that fits with below queries directly here.

## Predicting churn​

In our first examples we will predict the probability of churn for all points in the data set. We will then sort the output by the likelihood of churn.

The query predicts the probability of churn by learning from already churned customers; if customers haven’t churned yet, we can predict how likely they are to churn.

We input all columns from the data set `Bank_Customer_Churn_Prediction` but filter out the `customer_id` column from the prediction, using the `ignore` flag, since we do not believe it will contribute to the prediction but only add noise.

``SELECT Bank_Customer_Churn_Prediction.*, prediction     FROM Bank_Customer_Churn_Prediction     PREDICT(churn, ignore=customer_id) ORDER BY prediction DESC``

The output column `prediction` is an output of the PREDICT method, the value being the predicted outcome; 1 if churn is predicted and 0 otherwise. The column `probability` is an output of the PREDICT method, giving the probability of the predicted outcome being positive (i.e. churn = 1).

## Understanding churn​

The next step of the analysis is to understand what drives the prediction to better understand what can be done to prevent churn or what signs to look out for.

To do this we extend the previous analysis by wrapping the prediction with an `EXPLAIN`. For each input column this will output a value between 0 and 1.0 indicating how influential that column is to the prediction of churn.

It uses all columns from the input dataset `*`, except 'customer_id', to learn about 'churn' and returns how important each column is to the prediction of churn.

``SELECT * FROM Bank_Customer_Churn_Prediction     EXPLAIN(PREDICT(churn, ignore=customer_id))``

In this example we find that `age` is the main driver of churn.

## Most likely high value customers to churn​

Now that we understand how to predict churn and what drives it, let's look at who is most likely to churn that would significantly impact our business.

To do this we filter our prediction, firstly by who has more than 85% likely to churn and secondly by who has more than 100k in the account and hasn't churned yet.

Note that the inner `SELECT` statement uses all the data points, churned or not, to learn what and predict the probability of churn. We then in the outer statement filter out the accounts that are yet to churn and has a high balance.

In total, the query applies 3 filters to the previous "Predict churn" query:

• Filter 1 - High probability of churn (>75%);
• Filter 2 - High value customer with 100k+ in their account;
• Filter 3 - Customers that haven’t yet churned
``SELECT customer_id, balance FROM     (SELECT * FROM Bank_Customer_Churn_Prediction PREDICT(churn, ignore=customer_id)     WHERE probability > 0.85 and prediction=1) WHERE balance > 100000 and churn=0``

## Expected loss of balance per customer​

Now that we understand how to predict churn we can define the notion of "expected loss" for a customer as the probability of them churning times their balance.

This gives an indication of the balance at risk for all accounts, not just high value or high churn probability ones.

We compute this by an estimate of the likely loss of balance of customers who haven't yet churned but are predicted to churn. We use `balance` times `probability` to calculate expected loss, rename the output to match and order by size of the expected loss.

``SELECT customer_id, balance*probability AS expected_loss FROM     (SELECT * FROM Bank_Customer_Churn_Prediction PREDICT(churn, ignore=customer_id))     WHERE churn=0 ORDER BY expected_loss DESC``

## Expected average loss of balance per country​

Once we have defined the notion of "expected loss" on an account basis, we can aggregate these across accounts and get different views on the total or average expected loss by subsets of the data. This gives us an insight into which countries are most exposed.

In this case we compute the average expected loss per account by country. By averaging the expected loss we can compare across countries more easily. We could also have summed and gotten a view of the total expected loss for each country.

``SELECT country, AVG(balance*probability) AS expected_loss FROM     (SELECT * FROM Bank_Customer_Churn_Prediction PREDICT(churn, ignore=customer_id))    WHERE churn=0 GROUP BY country``