Understanding the intuition behind Retention Analysis

Yash Gupta
Data Science Simplified
7 min readNov 28, 2023

--

If you’ve been a part of a B2C or D2C business, there’s a good chance that one of the key metrics your company tracks is Retention. The term ‘retention’ literally means — to remain. The concept of retention can apply to many aspects, it can be education, where students are retained for an additional year of education or in HR, where employees are retained to work for your organization instead of moving out.

Retention is undoubtedly one of the top 5 core metrics to track business performance in case your organization deals with customers. Calculating retention can also help you understand how to track customer lifetime and can help in ensuring your efforts are driven towards improving customer retention and also with it, the entire value of your customer base.

In this article we’ll cover a few things around Retention and hopefully solidify the idea for you to use in your future analyses.

Title image for thumbnail

How to calculate Retention?

Retention, as the word goes — means ‘to sustain’. There are different ways one can calculate retention but the simplest and the best way to do it is to pick a cohort and calculate the % of customers who stay vs the customers who started!

Sounds unknown?

Actually, it’s pretty simple so let’s break that down. There are 2 things required to calculate retention.

Cohort Customers — Total customers who started together (can be a given day, week, month, etc.) and is usually a time-bound category of individuals.

Retained Customers — The customers who are still a part of your customer base after a stipulated period of time

And simply, the retention calculation then becomes…

Retention % after ‘x’ days = Retained Customers after ‘x’ days / Cohort customers

Let us get into understanding what the usual retention scenario looks like in an organization.

Case:

Say, you are a part of a company that offers a product to customers on a subscription basis. The retention in this case can be as simple as understanding this (for monthly retention rates);

Out of the customers who start a subscription this month, how many still stay subscribed in the next month and the month after that and so on.

In a usual business setup, there is no way that all your customers take a subscription every subsequent month from you without an end to the subscribing. That’s what needs to be tracked for you to make the right decisions around your customers to ensure that they subscribe for as long as possible and keep contributing some value to your business.

Let’s say this is how the data looks;

Month 1: 100 subscribers, all new

Month 2: 120 subscribers, of which there are 60 new

In this case, its important to note that every subsequent month, your customer base is changing and that you will have newer customers start next month and therefore should NOT be a part of your retention analysis for all the customers who start this month.

This is where the cohort differs.

Month 1 can be considered as one cohort (cohort 1) while Month 2 can be considered as the other (cohort 2).

Let’s calculate retention in Month 2 for customers starting in Cohort 1. Knowing that there are a total of 120 customers left in Month 2, of which 60 are new… it implies that the remaining 60 have started in Cohort 1.

Therefore retention for Month 2 from Cohort 1 becomes… 60/100 i.e., 60%

Let’s say Month 3 has only 30 of these remaining, the Cohort still remains the same at Cohort 1. But month 3 retention then would be 30/100 at 30%

To add a bit more to this, let us say that for customers starting in Cohort 2, we see that in month 3 there are 45 customers left. So retention for month 2 from customers starting in cohort 2 becomes 45/60 (retained in month 3 /new in month 2) i.e., 75%

Seems like quite a task to track?

Let’s see how we can put these numbers together in Excel and let me add a few more data points to this in the next steps to just clear some air. Just changing the numbers so its easier to read.

Basic retention calculations

Let’s say that you now have to compare the same retention for multiple cohorts, then the data can be massive.

Given below are two ways to counter this problem.

Comparison of Retention rates for different cohorts is easier when you put all the data you need together (either in a table or a graph).

The tabular representation of the retention data can be called a retention triangular while a graphical representation would simply be a retention curve.

Each of the two can be used for relatively easy analysis of the numbers.

Note: Though retention ideally shows a declining trend, in certain business models, the customers can come back and therefore only a long term retention model can seem downward sloping, but can have its own peaks and troughs.

Retention Triangular (Tabular):

The Retention table is ideally called a triangular because of this — Let’s say you have data from January 2023 onwards and you are currently in the month of July (which is still ongoing so you have data only until the end of June).

Retention triangular with customer volumes

In this case, your data for January’s cohort is available until June (M0 is the initial month of the cohort itself and retention starts from the first subsequent month i.e., M1 in February and so on till M5 in June)

Similarly, for the rest of the cohorts, you have one lesser data point, which in turn lets your table looks like this, a simple upper left triangular dataset.

Now while the tabular version of numbers is good, a little effort goes a long way. This can be made better using percentages i.e., your actual retention percentages. (and some conditional formatting)

Retention triangular with retention rates

Yes, all your customers are retained in the cohort month and therefore the retention rate will be 100% for M0 across all cohorts.

Immediately, we can see that early months’ retention has been very bad for the cohort of March 2023, though there are significant number of customers there…

Calls for some discussion with your team on what went wrong maybe?

Pro tip:
In case you want to know the general trends for retention, there’s a simple way of doing so, an average for all the retention months (for available data) will give you the general retention rates across all cohorts.

For example, overall M1 retention in the case above would be an average of all M1 rates given (90,90,70,83,88) which leads to 84%.

Retention Curve (Graphical):
Let’s assume that you have the same data but for 10 months now. To have a brief idea of how your retention for one cohort would look like, it would be something like this..

example retention curve with volumes

The curve is better to use because it shows you exactly how the retention changes, the curves can take any shape depending on the kind of business you are in.

(It can significantly depict any sort of SEASONALITY affecting your business too)

The retention curve can be added for a different cohort to compare how cohort to cohort performance differs in terms of retention and take necessary actions as required.

cohort to cohort comparison for retained volumes

If you plot a longer term retention curve, it should ideally look like this as given under (general perspective on a subscription model business as we discussed);

example long term retention curve (does not have to be precisely like this every time you plot one)

I’ve added the orange line to show the trend, but as you can see the decline can take any shape, and that there can be customers who come back to lift the curve back up but eventually it goes declining.

Note: A retention curve ideally never touches 0% but if it does — the business is in some serious trouble.

Interpreting Retention Rates

This is just a placeholder because this is beyond the scope of this article, but yes, I will write a piece dedicatedly to this section soon and add it here for everyone to refer to!

Thanks for reading it all the way here, do add a comment with your thoughts on the article.

For all my articles:

Connect with me on LinkedIn: https://www.linkedin.com/in/yash-gupta-dss/

~ P.S. All the views mentioned in the article are my sole opinions. I enjoy sharing my perspectives on Data Science. Do contact me on LinkedIn at — Yash Gupta — if you want to discuss all things related to data further!

--

--

Yash Gupta
Data Science Simplified

Business Analyst at Lognormal Analytics and Data Science Enthusiast! Connect with me at - https://www.linkedin.com/in/yash-gupta-dss