Reset password:

Strategic insights
Measuring Results: Don't be Fooled By Math




Written by on May 9, 2013

Shared By Plus Subscriber

Daniel Van+Meer


This is Baekdal Plus content. It is shared with you for free by a member. Please reshare it.

In my report about "Making Sense of Social Media Monitoring and Sentiment Analysis", I very briefly mentioned the problem with using averages. But a number of people commented on just that one thing as being a constant problem they have every day.

There are many problems with averages. In fact, there are many problems with using math in the first place. As an analyst I use math every day. When I'm studying a brand or a publisher, I sometimes live inside a spreadsheet for a whole week. Here, for example, is one of the many spreadsheets I worked on for "Reverse Engineering Facebook EdgeRank - Beyond the Theory"

But math has one big shortcoming. It provides an extremely precise answer to what is often a vague question. For instance, 2+2+8 equals 12, with an average of 4. But if these people were buying products, nobody bought four products.

You get a precise answer to a vague question. Your question is, "what are people buying?" and math gives you four products on average. See the problem?

As an analyst, I very often find myself in conflict with math. When you are analysing vague concepts like user behaviors or future trends, math has a tendency to mislead you from seeing the real effect. And using averages is one of the worst elements of all.

So, let me give you a few simple examples of the problem and how to do it better:

The curse of averages

Using averages has long been a favorite of business analysts. It provides a simple straightforward number, without all the complexity involved in having to look at the details. If you are a sales manager, telling your CEO that 'on average' product A is doing 37% better than product B, is very effective.

But averages are also one of the main reasons why so many companies fail to understand what is really happening to their business.

One simple example is these two graphs. They are complete opposites, but have exactly the same average. And this is true not just for the vertical average (X-axis), but also for the horizontal axis.

But this is also largely useless, because no company would ever have a graph like this. The real world is far more complicated, which makes using averages even more misleading.

So let's look at some real world examples:

Different products

Here is one example showing the total for two different things (this could be sales, products or people). As you can see, the average is somewhere in the middle.

So, this graph could illustrate the difference between free-traffic and subscriber traffic, for example. Or customers versus non-customers. But if you take the average, you end up thinking that these two groups are the same.

Never calculate the average of two completely different things. People in one box might not the same as the people in another box.

If you look at content or products in a webshop, you might find a result like this:

On 'average' you end up thinking that things are actually going better than they really are. This is a common error that people make. In fact, not just people but also the tools that we use.

Your analytical tool might tell you the average page views per month. But what if you started to analyse what each product was about? What if you learned that all the 'spikes' are special series products while the rest are just your basic products? Then you would suddenly have two 'averages'. One average for your popular products and one for your basic products.

This helps you understand people's behavior, and maybe that you should focus more of your time making those special products.

Average by age

Imagine that you are doing a study to see who is buying your products, by age. The result turns out to be a graph like the one below. And, based on this, you decide to calculate that the average age is 34 years old.

This is wrong for so many reasons. First of all, when you tell people about an average, people naturally assume that the average also represents the majority. And it doesn't.

Saying that your market is 34 years old on average is just plain wrong. Secondly, here is another graph from the US Census Bureau, depicting the total population by age :

They are exactly the same. So in the first graph you assumed that your products appealed mostly to younger people, and not as much to the older generation. But in reality, it appeals equally to everyone.

In other words, your 'average age' is everyone. There is no average here.

This is a very common mistake that people make in surveys. Don't calculate average ages unless one specific age group is clearly differentiated from the rest.

The long tail

Now let's move on to one of the graphs that we see most often in the digital world. It's the long tail. The long tail is a very simple graph. It starts with a few high-rollers (the short head), followed by a very long tail of minor players. The individual elements within the long tail itself are mostly meaningless, but the sheer number of them quickly create a lot of mass.

One example of this is from a new study by Google Analytics. It shows how many days or steps people take before they buy your products.

As you can see, it's the classic long tail effect. A lot of people buy your products on the very first day, but a number of people take more than one day to decide.

The long tail is very powerful.

Now the temptation is to calculate the 'average number of days', but when you do that you get a number that has no relevance to anything. The average might tell you that it takes six days for people to decide, but only 0.7% actually buys anything on that day.

You can't calculate averages for a long tail. You have a short head (one type of behavior) and the long tail as a whole (another type of behavior). There is no average.

And this is true for many different situations. When your sale is a product with the width of a platform, trying to pinpoint a single point within it leads you to make meaningless conclusions.

Defining your group before you look at the data

Another big problem with averages is when you define what groups you want to measure, before you analyse what's actually in them.

A marketing department might decide to measure how many sales they got from their 'image' and 'product' campaigns. As result they find that the image campaign group performed better.

Because of this, they decide to spend more of their budget on image campaigns in the future, and cut down on the product campaigns.

Big mistake!

Never, ever, define your groups before you look at the raw data. It's the single biggest mistake that people make when analysing data. Instead, look at the raw data first, and you might find something similar to what you see below:

Looking at the raw data, we find that three image campaigns performed really well. That's wonderful, and you should learn from that. But the rest of the image campaigns actually performed worse than the seven highest performing product campaigns.

So if you had cut down on product campaigns, you might have ended up selling less products... not more. Yes, the average for image campaigns is higher, but we don't care about that.

Another example of this is when you define not just your product group first, but also start to segment your audience. What you might find is that men predominantly like Product Group A, while women prefer Product Group B.

Based on this you decide to create more masculine advertising campaigns for Product Group A, and more feminine campaigns for Product Group B, right?

Well, maybe... but if you had looked at the raw data first, you might have found this:

Now you suddenly see that one of your B products are actually mostly purchased by men, and one of your A products is mostly purchased by women. So if you had made your entire product A more masculine, you might potentially have lost a lot of sales.

Don't use averages to put people into boxes, and don't define your groups before you look at the data. Look at the data first, then you can identify the patterns and define groups to put them in.

And this is not just about brands and products. The same is true for content and publishers. Imagine that you are running a lifestyle magazine and you look at your revenue for your individual sections.

For instance, you might find that your 'Interior design' section is performing almost twice as much as your 'Food & garden' section. So, in light of that you are considering dropping the 'Food & garden' section and relocate (or fire) those resources.

But that's a classic mistake caused by only looking at averages.

Imagine that, instead of looking at the average per section, you look at the individual performance per article, color coded by who wrote each one. You'd find this:

Suddenly you realize that your problems with the 'Food & Garden' section have nothing to do with that section at all. Instead the low performance is caused by one of your journalists. Sophia's articles are performing remarkably regardless of what they are about, while Kate's articles are more of a hit and miss.

Looking at the average by section caused you to completely miss the real problem. And the reason for that was that you had defined your groupings before you looked at the data. You defined the section, and then calculated the averages.

Instead, you should have mapped out all the content individually, identified the patterns and group them into something you could take action on.

In this case, you could look at why Kate is performing worse than Sophia. Is it because Sophie has more followers on her social channels, and is she using her influence to make her article more popular? Is it because Sophia is also your event organizer and is therefore more publicly visible than Kate? Is it because Kate doesn't have the necessary depth, drive or energy?

Before, while just looking at averages per section, you were contemplating closing down "Food & Garden". Now, you know exactly what the problem is, so maybe you could instead focus on helping Kate become just as influential as Sophia. In fact, you should use Sophia to position Kate even better for your readers.

Instead of focusing on what to close, you could focus on what to improve.

How remarkable is that?

The basic problem here is the internal silos within organizations. If you are a brand and you have three different product lines, each product line is managed by a separate product team. That in turn causes you to define each product line as a group, and evaluate each group as an average against the others.

Big mistake.

It's not a problem to have people work within a niche. It helps people keep the focus and stay relevant. The problem is when you learn something different from each silo. That's when you will end up with big problems if you just mix it all together as an average number.

A simple way to prevent making these mistakes

How do you prevent yourself making all these mistakes? What many people do wrong, in my opinion, is to start with the math. They start off with a spreadsheet full of numbers, and then they do all these fancy calculations to find answers. These calculations provide very precise numbers, which they then turn into meaningless graphs that nobody can use for anything.

It's an age old problem of math giving precise answers to vague questions.

What I do instead, is to start by turning all the raw data into a graph before I do any calculations. And then I try to visually identify the overall patterns. Is something standing out, or do the lines just look like cooked spaghetti mixed together in one big mess?

You create a visual representation of the data, and if it isn't immediately clear what is going on, you start to work the data. You sort the data in a different way, you take something out, you compare it with something else.

Essentially, instead of calculating what the answer must be (which is often wrong because you don't really know what the question is), you fiddle with the data until you see a pattern. And only then do you turn to math to test your assumptions.

It's the opposite way to how most people work (and how most tools work).

Here is how I analyzed the traffic patterns for my Plus articles, in "The Future of Analytics And The Trend of Demonstrable Causation". I started out with converting all the data into one big graph. I quickly discovered that the articles followed a three-step pattern. First we have the initial exposure, followed by sharing which creates secondary exposure points. But after a week, the articles ended up in the long tail.

I then zoomed out even more, and found that the initial seven days of traffic (the short head), created as much traffic as the following 30 days, which was the same amount of traffic as for the following 100 days.

Don't just start with the math. Create a visual image of what you are trying to analyse. Find the patterns. Test your assumptions to understand why it happened that way, and then do the math to figure out just how big an impact it has.

But remember, don't just add numbers together. Math gives precise answers to vague questions. If you don't understand the question, your answers won't mean anything. And you could end up making ill-informed marketing decisions as a result.

Also read: "Actively Avoid Insights: 4 Useful KPI Measurement Techniques".

Shared By Plus Subscriber

Daniel Van+Meer


This is Baekdal Plus content. It is shared with you for free by a member. Please reshare it.

Share on

Thomas Baekdal

Thomas Baekdal

Founder of Baekdal, author, writer, strategic consultant, and new media advocate.


Check out my book: THE SHIFT - from print to digital and beyond? Free for Baekdal Plus subscribers, $8.79 on Amazon.

There is always more...