Skip to main content

Lead Scoring

A lead scoring method is a way to assign a numeric value to each potential lead, such that the higher the value the more likely they are to convert. The scoring can also include a component indicating the potential value of the lead to the business, in the case they convert to a customer.

Example Data

For our lead scoring example we use data from the following Kaggle dataset Joining Marketing Funnel with Brazilian E-Commerce, which describes the marketing funnel of the Olist ecommerce site.

The dataset is made up of several tables. In this example we have joined a few of them together to produce the following set, which captures all available information about the funnel for signing up new sellers to the platform You can download the data set here.

Understanding what drives conversion

We start our lead scoring analysis by trying to understand what factor drive the conversion from lead to customer.

We do this by predicting whether or not a lead converted into a customer and we then explain this prediction.

In our dataset we can detect whether a lead has converted by check whether they have been assigned a seller_id. Using this we define a new column converted indicating whether the lead converted or not.

SELECT landing_page_id, origin, 
(CASE WHEN seller_id IS NULL THEN 0 ELSE 1 END) as converted
FROM Olist_Seller_Data EXPLAIN(PREDICT(converted))

The SELECT statement is quite simple, only passing in the landing_page_id and origin of the lead. We find that overwhelmingly whether a lead concerts or not is driven by which landing_page_id they first landed on.

Most likely recent leads to converted

Having analysed what drives conversion we can now use this to define a lead score based on the prediction.

We output this for all leads for which the first contact point was after 2018-01-01.

To do this we nest a prediction statement within a filter statement.

SELECT mql_id, first_contact_date, prediction FROM 
(SELECT mql_id, first_contact_date, landing_page_id, origin,
(CASE WHEN seller_id IS NULL THEN 0 ELSE 1 END) as converted,
prediction
FROM Olist_Seller_Data PREDICT(converted, ignore='mql_id, first_contact_date'))
WHERE first_contact_date > '2018-01-01'
ORDER BY prediction DESC

Understanding what drives high monthly revenue

Related to conversion we can also predict the potential revenue of our leads by using the reported monthly revenue of all converted leads.

The first thing to do is to understand the prediction.

SELECT declared_monthly_revenue, landing_page_id, origin 
FROM (SELECT * FROM Olist_Seller_Data WHERE seller_id IS NOT NULL)
EXPLAIN(PREDICT(declared_monthly_revenue, model='reg'))

Expected ARR per lead

Using the conversion and revenue predictions we can define the expected ARR per lead as the expected revenue weighted by the probability of a conversion. We turn this into an ARR number and output it for each lead that has yet to convert.

SELECT left_stmt.mql_id, 12 * conversion_prediction * revenue_prediction as expected_arr FROM (  
(SELECT mql_id, landing_page_id, origin, declared_monthly_revenue,
prediction as revenue_prediction
FROM Olist_Seller_Data PREDICT(declared_monthly_revenue, ignore=mql_id, model='reg')) as left_stmt
INNER JOIN
(SELECT mql_id, landing_page_id, origin,
CASE WHEN seller_id IS NULL THEN 0 ELSE 1 END as converted,
prediction as conversion_prediction
FROM Olist_Seller_Data PREDICT(converted, ignore=mql_id)) as right_stmt
ON left_stmt.mql_id = right_stmt.mql_id
) WHERE converted=0 ORDER BY expected_arr DESC

Total ARR in pipeline by Origin

Having calculated the expected revenue per lead we can aggregate it to get a better overall view of the pipeline. We aggregate and group it by origin to better understand from which channel the most ARR can be expected and where our pipeline looks the healthiest.

SELECT origin, SUM(12 * conversion_prediction * revenue_prediction) as expected_arr FROM (  
(SELECT mql_id as mql, landing_page_id, origin as org, declared_monthly_revenue, prediction as revenue_prediction
FROM Olist_Seller_Data PREDICT(declared_monthly_revenue, ignore=mql_id, model='reg')) as left_stmt
INNER JOIN
(SELECT mql_id, landing_page_id, origin, CASE WHEN seller_id IS NULL THEN 0 ELSE 1 END as converted,
probability as conversion_prediction
FROM Olist_Seller_Data PREDICT(converted, ignore=mql_id)) as right_stmt
ON left_stmt.mql = right_stmt.mql_id
) WHERE converted=0 GROUP BY origin ORDER BY expected_arr DESC