SQL Transformations For Calculating Stripe MRR

A common request we see both here at Definite and from our customers is MRR reporting using Stripe data. Yes, the Stripe UI already provides a handy dashboard with these metrics, but often times customers want to tweak metric definitions or enrich it with data from other systems such as Hubspot. After extracting the raw data from the Stripe API into a data warehouse, one must apply a series of transformations to the data in order to calculate MRR.

Doing this is not straightforward and in this post we share some of the pitfalls we encountered along with our MRR calculation methodology. Note that the methodology shared here works for our specific use case and way of calculating MRR, other methods may vary depending on business requirements.

Extracting Stripe Data And The Data Schema

The first step is to extract the raw Stripe data. For data extraction we use a forked version of the Singer Stripe tap, and load the data into a DuckDB data warehouse. The outputted schema for the data can be found here. Here is an example of what the balance_transactions object’s schema looks like:

{
  "properties": {
    "fee": {
      "type": [
        "null",
        "integer"
      ]
    },
    "currency": {
      "type": [
        "null",
        "string"
      ]
    },
    "source": {
      "type": [
        "null",
        "string"
      ]
    },
    "fee_details": {
      "type": [
        "null",
        "array"
      ],
      "items": {
        "properties": {
          "application": {
            "type": [
              "null",
              "string"
            ]
          },
          "type": {
            "type": [
              "null",
              "string"
            ]
          },
          "description": {
            "type": [
              "null",
              "string"
            ]
          },
          "amount": {
            "type": [
              "null",
              "integer"
            ]
          },
          "currency": {
            "type": [
              "null",
              "string"
            ]
          }
        },
        "type": [
          "null",
          "object"
        ]
      }
    },
    "available_on": {
      "type": [
        "null",
        "integer"
      ]
    },
    "status": {
      "type": [
        "null",
        "string"
      ]
    },
    "description": {
      "type": [
        "null",
        "string"
      ]
    },
    "net": {
      "type": [
        "null",
        "integer"
      ]
    },
    "exchange_rate": {
      "type": [
        "null",
        "number"
      ]
    },
    "type": {
      "type": [
        "null",
        "string"
      ]
    },
    "sourced_transfers": {
      "items": {},
      "type": [
        "null",
        "array"
      ]
    },
    "id": {
      "type": [
        "null",
        "string"
      ]
    },
    "object": {
      "type": [
        "null",
        "string"
      ]
    },
    "created": {
      "type": [
        "null",
        "string"
      ],
      "format": "date-time"
    },
    "amount": {
      "type": [
        "null",
        "integer"
      ]
    },
    "updated": {
      "type": [
        "null",
        "string"
      ],
      "format": "date-time"
    }
  },
  "type": [
    "null",
    "object"
  ]
}

Common Pitfalls

Now that we have the data and know its schema, let’s discuss some of the pitfalls we encountered when trying to calculate MRR with this data:

Pitfall 1: Subscriptions object only returns the most recent state of each subscription

Other approaches found in other guides and articles use the subscriptions object in Stripe to calculate MRR. In theory this works since it has start date, end date, plan interval (month, quarter, annual, etc.), and amounts. However, the Stripe API only returns the most recent state of each subscription; we don’t know if a plan was stopped and restarted, if its price changed, if the subscriber added or removed services to the plan, if its interval changed, and more. As a result using subscriptions will not provide an accurate historical view of MRR.

Solution: we must use historical invoices instead of subscriptions to to build an MRR history table.

Pitfall 2: Invoice period start and end dates are not to be trusted

Each invoice has a start and end date. Though it might tempting to use them, these dates are actually start and end dates for when items were added to the invoice, not billing periods for each subscription.

Solution: use invoice line items. Line items in invoices have a period object with start and end dates. We should use these dates to determine monthly subscription periods.

Pitfall 3: Not all invoice line items are for subscription plans

Non-recurring charges can be found in invoice line items (one-off fees, late fees, etc.) We need to filter these out and not include them in MRR calculations.

Solution: filter invoice line items where type = subscription.

Pitfall 4: Discounts are not applied to invoice line item amounts

While the invoice object shows the final due amount, invoice line items shows the original amount without discounts. For each line item we must look up if it has a discount and if so, subtract the discount amount.

Solution: join discounts object to invoice line items and subtract discount amount (if any).

Pitfall 5: some subscriptions are not invoiced monthly (eg quarterly or annual plans)

It’s not uncommon to find subscriptions that are billed annually, quarterly, or even once every 6 months. In these scenarios the corresponding invoice line item will only appear once per billing cycle rather than monthly. Furthermore, the amount due will be the total amount for that period and not the monthly amount.

Solution: Normalize non-monthly plans so that revenue is divided up for each month during which the plan is active. This was the most tricky part and we show some SQL transformations on how to do this below.

SQL Transformations

Below is an outline of how we calculated MRR. Some assumptions we made:

The data and schema closely matches the Singer Stripe extractor output
Each invoice line item’s period start month is the month that revenue is being booked for
Churn month is the last invoice line item’s period end month

Step 1: Gather relevant invoice line items

with discounts as (
    select 
        id,
        unnest(from_json(json(discount_amounts), '["JSON"]')).amount as discount_amount
    from stripe.invoice_line_items 
    where type = 'subscription'
),

line_items as (

    select 
        i.id,
        i.subscription,
        -- subtract discounts
        (i.amount - coalesce(d.discount_amount::int, 0)) / 100 as amount,
        date_trunc('month', (period ->> 'start')::date) as period_start_month,
        date_trunc('month', (period ->> 'end')::date) as period_end_month,
        -- calculate invoice period length in months
        datediff('month', (period ->> 'start')::date, (period ->> 'end')::date) as period_length
    from stripe.invoice_line_items as i
    left join discounts as d
        on i.id = d.id
    join stripe.invoices as i2 
        on i.invoice = i2.id
    where type = 'subscription' -- filter for subscription line items
)

select * from line_items;

Note that in the SQL above we:

filtered for line items that are from subscriptions only
subtracted discounts from amounts
calculated line item period length in months

Step 2: Normalize quarterly and annual plans

In our data we only had monthly, 3-month, and 12 month plans. Other Stripe instances may have different intervals (eg 6 month plans), so adjust accordingly. Let’s look at example of a 3-month subscription plan with a renewals:

We want to normalize this subscription so that each invoice line item is broken out into 3 separate months. For example the first line item for the period of 2023-11-01 to 2024-02-01, should have 3 rows one for each month between those start and end periods. Furthermore we have to divide each amount by the period length since we want to find the normalized revenue in each individual month (amount / 3 ).

Building upon the previous SQL, we can create this normalized view of multi-month plans like so:

...

quarter_plans as (
    select * 
    from line_items
    where period_length = 3
),

time_series as (
  SELECT 
      ('2017-01-01'::date + gs.i * INTERVAL '1 month')::date AS month
  FROM 
      generate_series(0, 240) AS gs(i) -- Alias generate_series as gs
  WHERE 
      ('2017-01-01'::date + gs.i * INTERVAL '1 month')::date <= date_trunc('month', current_date)
),

normalized_quarter_plans as (

select 
    ts.month,
    qp.subscription,
    qp.period_start_month,
    qp.period_end_month,
    qp.period_length,
    qp.amount / qp.period_length as mrr -- divide invoice amount by period length to normalize to month as mrr
from time_series as ts
cross join quarter_plans as qp
where ts.month >= qp.period_start_month and ts.month < qp.period_end_month
)

select * from normalized_quarter_plans;

The normalized result for the same 3-month subscription looks like this now:

Notice that we did not include a record for 2024-11-01 since there was no new invoice line item starting in that month. We consider this subscription to be churned in 2024-11-01 (as per our previously mentioned assumption).

You will want to use the same method for all multi-month invoice line items and then union those records with regular monthly invoices.

Step 3: Join your normalized monthly revenues with a time-series data set

After creating normalized monthly revenues for each invoice, we now need to create a time-series entry for each month. To do this we will utilize our time-series CTE again and join it to every normalized invoice month. Why do we have to do this? If a subscription is canceled and then restarted a few months later, our normalized invoices will not have records for those inbetween months, thus we have to do a cross join with the time-series CTE to also get those months. It is important that we do this so that we can later calculate reactivation MRR.

The SQL below shows the general idea of how to do this:

...

normalized_invoices as (
	// union of all month-normalized annual, quarter, month, etc. plans
),

time_series as (
  SELECT 
      ('2017-01-01'::date + gs.i * INTERVAL '1 month')::date AS month
  FROM 
      generate_series(0, 240) AS gs(i) -- Alias generate_series as gs
  WHERE 
      ('2017-01-01'::date + gs.i * INTERVAL '1 month')::date <= date_trunc('month', current_date)
),

agg_month as (
    select
        month, 
        subscription,
        sum(mrr) as mrr -- there can be multiple invoices in the same month
    from normalized_invoices
    group by 1, 2
),

-- join to subscriptions to get subscription start and end months
agg_month_2 as (
    select 
        a.month,
        a.subscription,
        a.mrr,
        date_trunc('month', s.start_date) as subscription_start_month,
        date_trunc('month', greatest(s.cancel_at, s.canceled_at)) as subscription_end_month
    from agg_month as a
    join stripe.subscriptions as s
        on a.subscription = s.id
),

-- CTE of every subscription and month combo, need in case a subscription stops and starts
agg_month_timeseries as (
        select 
            ts.month,
            ag.*
        from time_series as ts
        cross join (
            select distinct
                subscription,
                subscription_start_month,
                subscription_end_month
            from agg_month_2
        ) as ag
        where ts.month >= ag.subscription_start_month
),

mrr_timeseries as (
    select 
        ait.month,
        ait.subscription,
        ait.subscription_start_month,
        ait.subscription_end_month,
        coalesce(ag.mrr, 0) as mrr
    from agg_month_timeseries ait
    left join agg_month_2 ag
        on ait.month = ag.month
        and ait.subscription = ag.subscription
)

select * from mrr_timeseries;

The result for a single subscription looks something like this:

Notice that even for months during which there was no invoice (2024-06-01 to 2024-07-01, 2024-10-01), we still have rows for those months with MRR of 0.

Step 4: Calculate MRR metrics

Now we can calculate MRR metrics such as churn, expansion, contraction, etc.

...

mrr_timeseries_2 as (
    select 
        month,
        subscription,
        subscription_start_month,
        subscription_end_month,
        coalesce(lag(mrr) over (partition by subscription order by month asc), 0) as mrr_prev_month,
        mrr - coalesce(lag(mrr) over (partition by subscription order by month asc), 0) as mrr_change,
        row_number() over (partition by subscription order by month asc) mrr_rank_asc,
        row_number() over (partition by subscription order by month desc) mrr_rank_desc
    from mrr_timeseries
),

final as (
select 
    month,
    subscription,
    subscription_start_month,
    subscription_end_month,
    
    -- mrr
    case 
        when mrr > 0 then mrr
        when mrr = 0 and mrr_rank_desc = 1 and (subscription_end_month > date_trunc('month', current_date) or subscription_end_month is null) then mrr_prev_month --invoice not issued yet
        else mrr
    end as mrr,
    
    -- new mrr
    case
        when mrr_rank_asc = 1 then mrr_change
        else 0
    end as new_mrr,
    
		-- contraction mrr
    case
        when mrr_change < 0 and mrr > 0 then mrr_change
        else 0
    end as contraction_mrr,
    
    -- expansion mrr
    case
        when mrr_change > 0 and mrr_rank_asc > 1 and mrr_prev_month > 0 then mrr_change
        else 0
    end as expansion_mrr,
    
    -- reactivation mrr
    case
        when mrr_change > 0 and mrr_rank_asc > 1 and mrr_prev_month = 0 then mrr_change
        else 0
    end as reactivation_mrr,
    
    -- churn mrr
    case
        when mrr = 0 and mrr_rank_desc = 1 and (subscription_end_month > date_trunc('month', current_date) or subscription_end_month is null) then 0 --invoice not issued yet
        when mrr = 0 and mrr_prev_month > 0 then mrr_change
        else 0
    end as churn_mrr
    
from mrr_timeseries_2
)

select * from final;

The final output should contain records for each subscription in every month from subscription start date:

Conclusion

The methodology mentioned above is what currently works for us and for our customers. It should be taken as a general outline on how to approach calculating MRR with raw Stripe data, and can be tweaked to fit different needs. If you’re looking for something like this but don’t want to go through the trouble of extracting data from the Stripe API, loading it into a data warehouse, and running the SQL transformations yourself checkout https://www.definite.app/, where we’ve done the heavy lifting already.

Calculating MRR From Raw Stripe Data Is Tricky: Here’s How We Did It