Anyone who has ever tried to analyze the performance of a website are familiar with the problem of inaccurate data. I have written about this in so many of my articles. It's not that the analytics tools are directly trying mislead us, it's just that there is a limit to how accurate we can measure things.
I want to give you one very simple check that you can do yourself, which illustrate just how crooked things are. We are going to look at the most basic metric of all, being how much traffic you have, and see if we can identify how accurate that number is (or inaccurate).
For this check, however, you do need one key metric and that is UserID. UserID is a custom variable that you set identifying exactly who someone is. It's a metric that you have if you are running any site that requires people to sign-in. So, any media site that is based on registered users or subscribers, or any web shop that people visit fairly regularly will work for this.
Sadly, this won't work for brand blogs or sites with anonymous traffic, because in those cases we have no user baseline to work with (everyone is anonymous).
Okay? So, let's go.
What you do is to create a custom report that only shows UserIDs, and then you output the number of sessions, pageviews, unique pageviews, users, and new users for those UserIDs. The new users isn't actually that useful, but, hey, more data!
You also want to do this over a period long enough so that each user has a good chance of visiting several times. For newspapers you can do this per month (or maybe even per week), because your frequency is so high, but for a web shop you probably want to extend it to six months or more.
Here is an example over a six month data-range.
As you can see, the total amount of real people who had signed into the site was 97,230 people (total UserIDs). We know this because the UserID is something set specifically when you identified exactly who each person is.
And if then look at the traffic stats, things start out pretty well. The first user (345353AE869907) visited this site 22 times, during which she viewed 86 pageviews, of which 73 were unique. In other words, she saw a few articles more than once.
Note: I could add here the importance of measuring actual read-rates, so that you could also see exactly how many article this person actually read, as opposed to just clicked on. However, that data wasn't collected in this case.
Now we ask a much more important question. How many users is User:345353AE869907?
The obvious answer to this question should be '1', since this one user is just one person. But you will notice that the analytics system recorded user:345353AE869907 as being 15 different users.
This, of course, shouldn't come as a surprise to you. Users are measured using cookies, so during the 22 session this person had, 15 different unique cookies were set. The reason being that this person is probably using multiple devices, and each device sets its own cookies, as well using different apps that are sandboxed on your mobile devices.
For instance, even if you are signed into a site in Safari on the iPhone, following a link on Twitter will open up an in-app browser window where you are not signed in. That's another new user, even though it's the same person, using the same phone.
We all know this is happening, but most people never really think about how big of a difference this makes. And, if you are only looking at the standard analytic dashboards, you might think you have 15 users, when you only have one.
That's a pretty scary thought, isn't it?
To put it this into perspective, here is the difference between what your standard analytics is telling you and what you actually have, using the data from before.
As you can see, your standard dashboard, which is just counting users, happily reports that you have 180,600 people visiting your site, while the actual amount of real people is just 97,230. That's a big difference.
Imagine that you are a newspaper with 25 million users, well, then it isn't 25 million, but only 13 million. That's a huge difference.
But you also notice that this isn't that bad. Because with the standard (and misleading) user count, all your other metrics are wrong as well. For instance, your standard dashboard will report that each user only visited 4.6 times over the past six months, when it was actually 8.6 times. It will report that each user only viewed 14.2 unique pages, when that number is actually 26.4 pages.
In other words, your audience is actually more loyal than you think.
This, of course, also has a huge impact on things like conversion rates for web shops, or subscription rates for newspapers, both are likely to be under-reported because they are compared to the wrong measure of users.
Obviously, this is just an example from one analytics account, your analytics will reveal different numbers. In this case the average deviation is +86%, but I have seen deviations much higher and much lower than that. For one of client (a magazine), the average deviation was 460%. Meaning that the user count reported by their analytics system was 460% higher than the number of actual UserIDs they had recorded on the site.
It all depends on how people use each site. The higher the frequency of use, the more wrong it is likely to be.
Obviously, the only reason we can do this 'reality check' is because we have UserIDs to work with. Without that, when our traffic is anonymous, we just don't know.
This impacts us in two ways.
Depending on how your site works, your verified traffic might actually be wrong as well. Take user 345353AE869907, whom we had identified as having visited the site using 15 cookies.
What about all the time she visited without having signed in at all? This one person was identified 15 times, but she might have visited the site 20 more times where no UserID was set at all.
So, for your verified traffic (those with UserIDs), depending on how your site works, this number might be even more skewed than you think.
It gets even worse for the people who doesn't have an account on your site at all, like the kind of traffic you have on a blog, or with advertising based media.
Now, you have no idea because we have no baseline to work with. It all depends on how people behave. For instance, sites that rely heavily on social traffic, mostly from mobile, is likely to have a ton of traffic that seems to have zero loyalty. They visit once and never return.
This is what your analytics tell you is happening because with the traffic being social+mobile, most of the visits are recorded as a fresh browser (and thus with a different cookie each time).
But is that actually what is happening? We don't know! We have no data that allows us to verify it one way or the other. We cannot measure what people do across different cookies without a UserID.
Similarly, if most of your traffic is desktop based, for, say, a site about model trains, the cookie information and thus the user count is likely to be far more accurate. Being desktop based, the cookies are far more persistent, and for older audience interested in model trains, they are less likely to be using ad-blockers or other tools that might destroy the data.
This is the reality of our world today. We know that our data is inaccurate, but we don't know in which direction it is inaccurate. It's an unknown.
We just have to do the best we can.
Also, even if you do have a UserID for a part of your audience, don't automatically think you can translate their numbers to the rest of your traffic. The people you have a UserID for are your actual customers and your loyal audience. They represent a different group of people than those who just come to you anonymously.
Accept that this is an unknown.
Instead of just looking at your analytics, which tells you that you have 180,000 users. Accept that this number probably isn't correct. Try to verify the audience you can, but then spend most of your time looking at other parts of your analytics.
In other words, instead of looking at your analytics in terms of absolute numbers (which probably aren't), look for the patterns, the profiles, the behavior and the nuances.
Focus your time on trying to identify the people who behave exactly the way you want them to, learn why they are doing that, and then try to get everyone to behave that way as well.
And if you are one of the sites that are recording UserID, do this reality check to see what your baseline is for your verified audience. When we have accurate data to work with, it's important that you use it.
Creating a propensity model is one of the most important tools publishers can have.
Many people say you can't measure trust. But you can, although before you do that, you first have to create trust.
When you are an independent publisher, analytics can sometimes be tricky because we don't enough data to work with.
Several publishers have found that reducing volume leads to an increase in revenue
The potential with machine learning is amazing, but it's not enough to identify a result. We also need to be able to do something about it.
Time is such a critical metric for publishers, but it's also a very complicated one.
When you are monetized by advertising, you tend to favor the least valuable metrics, but when you are focusing on subscriptions that changes to the most valuable metric.
Everyone talks about conversion rates, but that often doesn't tell you anything about how well you are converting people. Let's talk about conversion value.
Many large publishers are now turning to advanced analytics to understand their audiences, but what if you are not a big publisher? Can you still do it?
Publishers who start their own data studios need to take extra steps to identify real people.
Founder, media analyst, author, and publisher. Follow on Twitter
"Thomas Baekdal is one of Scandinavia's most sought-after experts in the digitization of media companies. He has made himself known for his analysis of how digitization has changed the way we consume media."
Swedish business magazine, Resumé