10 Ways you can dissect your KPIs! — (Part 1)

Yash Gupta
Data Science Simplified
8 min readMay 22, 2023

--

Have you ever looked at a number and wondered, what can I do with this? or rather what can this number tell me? Maybe a lot, but have you considered that the main reason why your numbers aren’t communicating a lot about your data with you is because you’ve been seeing them all wrong?

There are many different ways for you to see numbers and it gets better when you dissect them differently.

A number can be looked at from every direction and perspective and it is possible that every different way you look at your number can tell you something new about your data.

In this article, we’ll dive deeper into different ways you can look at numbers and see them in a new way. The ten ways we’ll cover are;

Data/KPI Dissection ideas:

*KPI — Key Performance Indicator

  1. Simple aggregations (Sum/Average/Median etc.)
  2. Iterative aggregations (count/distinct counts)
  3. Time-based variances (period-on-period aggregations)
  4. Time-based trends (YTD, WTD*, daily, hourly, etc.)
  5. Cumulative trends (basis time periods)

Stay tuned for part 2 for an elaboration on these ideas!

  1. Moving averages (weekly, monthly, etc.)
  2. Distributions (poisson, normal/Gaussian etc.)
  3. Percents of Total & Percentage differences (over periods)
  4. Predictive Trends (linear and non-linear)
  5. Hierarchy considerations (layers of your data)

*WTD/YTD — refers to Week To Date or Year to Date data, mostly used in time series trend analyses and comparisons.

Now, this list of ways is very diverse, as is most data. This list is personally a collection of what I’d use on my data, given that it comes in all shapes and sizes.

Data Diversity (note):

Know that Data can be discrete/continuous, time-based, aggregate and non-aggregate, transactional, multi-layered, and heavily customized based on n - the number of dimensions that you may have in your data.

So feel free to skip any ways that may not apply to your datasets considering the kind of data you have.

Let’s jump right into it now.

Simple aggregations (Sum/Average/Median etc.)

Simple aggregations are the same things that we’ve been doing all along with our data. We have taken addition of things or subtracted them, and found an average that goes with our data or so. To have an exhaustive list of these methods would be very long, but we all know a few common ones such as;

  • Sum
  • Mean
  • Median
  • Mode
  • Percentiles

These are very basic and are still some of the most useful aggregates to find when working with data that is transactional (and to find a complete picture). You’ll, however, see how better they can become with a certain combination of different techniques that we’ll explore ahead.

Other aggregations (count/distinct counts)

Aggregations like count or distinct count etc., are similar but sometimes much more useful aggregations that usually have to be checked or understood in most organizations. For example, how would a B2C model organization know how many unique customers they hosted in this entire year or month, or week?

Let’s consider a bakery. (for this example and for further dissections too)

You may want to know how many customers bought an item from your bakery today and also the number of transactions they made. The simple fact that a customer can come to your business multiple times will tell you different things such as;

  • How many customers have you had today (assume this is 100)
  • How many unique transactions did these 100 customers make (assume this turns out to be 250)
  • This can help you then derive a new KPI out of these two i.e., average transactions per customer (simple math says this is 250/100 or 2.5)
  • What if I tell you that only 25% of your customers (25) made only one transaction? Then your average transactions per customer who had multiple transactions goes up to the following: 250/(100–25) or 3.3
  • Do you see how making a small little change to the way you see your data shows you something completely new? Imagine if you knew the demographics of your customers and then you find out what’s common in these customers who make multiple purchases in a day. Maybe you can be more informed if a new customer like this comes in and you can drive more sales at the end of every day! (Amazing.)

Time-based variances (period-on-period aggregations)

Time-based data is probably the most amazing kind of data to work with because the possibilities of what you could do with time-series or time-based data are endless. There’s no end to the number of things you can find out about your data once there’s a time element added to it (the key is to not fall into the trap of making your dataset so granular that you hardly derive anything out of it)

Let’s go back to the bakery example (in Other aggregations)

  • What if you find out that by plotting your sales against a time of the day, you see that you sell a lot of your items in the evening? It may make sense to have enough resources/stock to ensure no potential sale is missed.

This is what can happen within a day. You can see the same trend to hold good over the weekends as well where your sales may be more than the other days of the week. Let’s go a little further in this,

  • What if you checked how your sales move when you compare last week’s sales to this week's? or just about how you’ve performed week-on-week or month-on-month for the last few weeks or months.
  • Comparing how your performance varies period-on-period can help you understand what works best for your business.
  • Do note that percentage change can be flexible in terms of calculation. It can change in ways to compare;
    - Percentage change vs start of the period
    - Percentage change vs the previous period
    - Percentage change vs the same period last year
    - Percentage change vs the day you change something in your business or an anchoring data point as such
    - Proportion change vs another period (or same as the above, example is what % of your sales goes to cakes on weekdays and weekends, etc.)
  • Any question can be answered with your data and any correlation can be established if the data suggests so, only if you look at the right things.

Time-based trends (YTD, WTD, daily, hourly, etc.)

Another advantage of working with time-series data is that it moves with time. The change in the data with different timelines is always a delight to unfold. Time, as we know, has multiple layers to it and so does the data (think of a streaming business — maybe every second you’re watching something on Netflix is a new data point they capture and utilize to deliver content to you)

Time-series data comes with special trends like seasonality and cyclicity that may be a part of your data as well. Consider this, it may be so that people tend to visit cafes or restaurants more on a weekend rather than a weekday.

Let’s consider the bakery again,

  • You may observe that in the holiday season (around December) your sales may be at their peak and you can probably give an offer to increase them furthermore or how your sales drop during the summer. It may be wise to offer your customers products like ice-creams that usually sell a lot during summer to improve sales.
  • Seeing any change vs the usual trends that you’ve observed over the years can be anything from a macro element impacting your business or just an outlier*.
  • Knowing how your period-on-period trends are can help you track improvements/declines in your sales constantly. This can then be correlated back to any changes you may have made in your organization that you may or may not want to continue going further.

*it’s important to remove the impact of any outliers on your dataset in order to fully understand the trends, especially any unexplainable outliers. If there are any outliers that are not data contaminations but rather meaningful changes, ensure that you consider them in your analysis as well. Outliers are not bad data, they’re just surprising and need a thorough investigation to include/exclude. (Do try taking both the pictures into consideration -with/without outliers before you arrive at a conclusion)

Cumulative trends (basis time periods)

Taking the trends one step ahead would be to analyze the cumulative trends. This is by far, one of the simplest yet most informative ways to see how your data moves. Cumulative trends can capture aggregations, show you how your KPI or data has been growing over time, and depict seasonality in the same graph.

  • In our bakery example, it would be easier to understand how sales for cakes and pastries have been growing your revenue this year by plotting a simple cumulative graph.
  • You could also see how much of your revenue in a certain period has grown from a certain set of customers (you may also be able to see a trend on how many days a group of customers usually take to come around)
  • Knowing a number as it stands for the entire data is one way to see it but knowing how it got to that point or stage is a whole different way of looking at it (which is much better)

STAY TUNED FOR PART 2 OF THIS ARTICLE!

If you think that there are simpler and not-so-complicated ways that we can apply to dissect data but are not mentioned in this article, do leave them in the comments below for everyone. Thanks!

Conclusion:

There are more ways than we could imagine to understand our data. There’s a quote that I abide by, “In God and Data we trust” and hence, try to derive/extract as much information as you can out of your data to help data become your organization’s most useful resource.

Once you have factual information directly coming in from your data, there’s no doubt that you can help grow your organization by effectively making actionable insights from your data.

There’s also this fact about numbers also making more sense when domain knowledge comes into the picture. The knowledge you derive from your data can be multiplied if you can put in the effort to understand the industry you’re working for. Any changes that might only be driven based on macro factors* might then be clearer to you.

Try to apply the techniques or methods mentioned in the article on your dataset and let me know how it works out for your data in the comments below.

*macro factors include elements that may affect your industry’s performance and are not reserved for your organization alone, therefore being out of your control.

For all my articles:

Connect with me on LinkedIn: https://www.linkedin.com/in/yash-gupta-dss/

~ P.S. All the views mentioned in the article are my sole opinions. I enjoy sharing my perspectives on Data Science. Do contact me on LinkedIn at — Yash Gupta — if you want to discuss all things related to data further!

--

--

Yash Gupta
Data Science Simplified

Business Analyst at Lognormal Analytics and Data Science Enthusiast! Connect with me at - https://www.linkedin.com/in/yash-gupta-dss