A lead scoring method is a way to assign a numeric value to each potential lead, such that the higher the value the more likely they are to convert. The scoring can also include a component indicating the potential value of the lead to the business, in the case they convert to a customer.

## Example Data​

For our lead scoring example we use data from the following Kaggle dataset Joining Marketing Funnel with Brazilian E-Commerce, which describes the marketing funnel of the Olist ecommerce site.

The dataset is made up of several tables. In this example we have joined a few of them together to produce the following set, which captures all available information about the funnel for signing up new sellers to the platform You can download the data set here.

## Understanding what drives conversion​

We start our lead scoring analysis by trying to understand what factor drive the conversion from lead to customer.

We do this by predicting whether or not a lead converted into a customer and we then explain this prediction.

In our dataset we can detect whether a lead has converted by check whether they have been assigned a `seller_id`. Using this we define a new column `converted` indicating whether the lead converted or not.

``SELECT landing_page_id, origin,     (CASE WHEN seller_id IS NULL THEN 0 ELSE 1 END) as converted     FROM Olist_Seller_Data EXPLAIN(PREDICT(converted))``

The `SELECT` statement is quite simple, only passing in the `landing_page_id` and `origin` of the lead. We find that overwhelmingly whether a lead concerts or not is driven by which `landing_page_id` they first landed on.

## Most likely recent leads to converted​

Having analysed what drives conversion we can now use this to define a lead score based on the prediction.

We output this for all leads for which the first contact point was after 2018-01-01.

To do this we nest a prediction statement within a filter statement.

``SELECT mql_id, first_contact_date, prediction FROM     (SELECT mql_id, first_contact_date, landing_page_id, origin,     (CASE WHEN seller_id IS NULL THEN 0 ELSE 1 END) as converted,     prediction     FROM Olist_Seller_Data PREDICT(converted, ignore='mql_id, first_contact_date'))     WHERE first_contact_date > '2018-01-01'    ORDER BY prediction DESC``

## Understanding what drives high monthly revenue​

Related to conversion we can also predict the potential revenue of our leads by using the reported monthly revenue of all converted leads.

The first thing to do is to understand the prediction.

``SELECT declared_monthly_revenue, landing_page_id, origin     FROM (SELECT * FROM Olist_Seller_Data WHERE seller_id IS NOT NULL)     EXPLAIN(PREDICT(declared_monthly_revenue, model='reg'))``

``SELECT left_stmt.mql_id, 12 * conversion_prediction * revenue_prediction as expected_arr FROM (      (SELECT mql_id, landing_page_id, origin, declared_monthly_revenue,         prediction as revenue_prediction         FROM Olist_Seller_Data PREDICT(declared_monthly_revenue, ignore=mql_id, model='reg')) as left_stmtINNER JOIN     (SELECT mql_id, landing_page_id, origin,         CASE WHEN seller_id IS NULL THEN 0 ELSE 1 END as converted,        prediction as conversion_prediction        FROM Olist_Seller_Data PREDICT(converted, ignore=mql_id)) as right_stmtON left_stmt.mql_id = right_stmt.mql_id) WHERE converted=0 ORDER BY expected_arr DESC``
Having calculated the expected revenue per lead we can aggregate it to get a better overall view of the pipeline. We aggregate and group it by `origin` to better understand from which channel the most ARR can be expected and where our pipeline looks the healthiest.
``SELECT origin, SUM(12 * conversion_prediction * revenue_prediction) as expected_arr FROM (  (SELECT mql_id as mql, landing_page_id, origin as org, declared_monthly_revenue, prediction as revenue_prediction     FROM Olist_Seller_Data PREDICT(declared_monthly_revenue, ignore=mql_id, model='reg')) as left_stmtINNER JOIN (SELECT mql_id, landing_page_id, origin, CASE WHEN seller_id IS NULL THEN 0 ELSE 1 END as converted,    probability as conversion_prediction    FROM Olist_Seller_Data PREDICT(converted, ignore=mql_id)) as right_stmtON left_stmt.mql = right_stmt.mql_id) WHERE converted=0 GROUP BY origin ORDER BY expected_arr DESC``