Skip to main content

Lifetime Value

The aim of defining a lifetime value of a customer, or LTV, is to associate a monetary value to an average customer. This value indicate how much we should expect to make from each customer during their lifetime as a customer. Here lifetime refers to the period from them signing up, or starting to use the site, to the user deleting their account, or stopping to use the site or service.

There are several ways to go about defining an LTV. For example for an ecommerce store it could be:

  • The total revenue made for each customer
  • The total margin, or commission, made for each customer

Example Data

For our customer lifetime value example we use data from the following Kaggle dataset Joining Marketing Funnel with Brazilian E-Commerce, which describes the marketing funnel of the Olist ecommerce site.

The dataset is made up of several tables. In this example we have joined a few of them together to produce the following set, which captures available information about each customer on the platform both in terms of orders and products bought. You can download the data set here.

Define LTV

First we need to define what we mean by lifetime value in the context of our current data set.

We define the LTV of a customer as the total amount of money spent on the platform throughout the lifetime of that customer.

To calculate it for each user would look like this

SELECT SUM(payment_value) as ltv, customer_city, customer_state FROM Olist_Product_Data GROUP BY customer_id

Understanding what drives LTV

To better understand the LTV we define a few extra features based on what product a user has bought.

SELECT
(CASE WHEN product_category_name_english IN ('computers_accessories', 'electronics', 'telephony') THEN 1 ELSE 0 END) AS electronics,
(CASE WHEN product_category_name_english IN ('bed_bath_table', 'furniture_decor', 'housewares') THEN 1 ELSE 0 END) AS home
FROM Olist_Product_Data

The column electronics will be 1 if the product is either of the categories computers_accessories, electronics and telephony or 0 otherwise. Likewise for home with respect to bed_bath_table, furniture_decor and housewares.

We use these along with the previous features to predict and explain what drives the LTV of each customer. To do this we aggregate the electronics and home columns as 1 if the column was 1 for any of the rows, ie if the customer ever bought any of electronics or home.

A nice way to gather features for use in SQL-inf is to create a CTE, like shown below.

with ltv_data AS (
SELECT SUM(payment_value) as ltv, customer_city, customer_state,
MAX(CASE WHEN product_category_name_english IN ('computers_accessories', 'electronics', 'telephony') THEN 1 ELSE 0 END) AS electronics,
MAX(CASE WHEN product_category_name_english IN ('bed_bath_table', 'furniture_decor', 'housewares') THEN 1 ELSE 0 END) AS home
FROM Olist_Product_Data GROUP BY customer_id
)

SELECT customer_city, customer_state, electronics, home
FROM ltv_data
EXPLAIN(PREDICT(ltv))

In this case we find that the city of the customer is what primarily drives the prediction of LTV, followed by whether the customer ever bought an electronics product.

Using LTV Predictions

We can also look directly at the predictions of LTV, to find the users with the highest potential lifetime values.

with ltv_data AS (
SELECT SUM(payment_value) as ltv, customer_city, customer_state,
MAX(CASE WHEN product_category_name_english IN ('computers_accessories', 'electronics', 'telephony') THEN 1 ELSE 0 END) AS electronics,
MAX(CASE WHEN product_category_name_english IN ('bed_bath_table', 'furniture_decor', 'housewares') THEN 1 ELSE 0 END) AS home
FROM Olist_Product_Data GROUP BY customer_id
)

SELECT customer_city, customer_state, electronics, home, prediction as predicted_ltv
FROM ltv_data
PREDICT(ltv)
WHERE ltv is NULL
ORDER BY predicted_ltv DESC

By ordering by the predicted LTV, and only selecting those where the LTV is NULL, we can find the customers with the highest lifetime values who have not yet made any payments. This is useful for following up with these users with high potential growth.