Sorry, we could not find the combination you entered »
Please enter your email and we will send you an email where you can pick a new password.
Reset password:


Plus Report - By Thomas Baekdal - May 2018

How Do You Identify Real People?

The way we define people in analytics and the way we do it in real life is very often not the same thing, and I have talked about this before. The most visible example is when you look at the unique user counts that are being reported by media companies. Often their total traffic exceeds their total market, which, even at the best of times, isn't very realistic.

There are many reasons for this. We have problems with how to measure people across browsers and devices; we have problems with ad blockers; and bots are a growing problem.

So what is the problem with bots?

Well, in simple terms, bots are traffic that isn't coming from real people, and it makes up a staggering amount of the total. In order to get real insights, we need to identify and filter them out, otherwise we end up with very inaccurate results.

Some bots, like the Googlebot (for Google Search) are easily identified, but there is quite a lot of activity which isn't that simple.

To give you an example, if I look at my data, only 8% of my traffic is identified as real people. 31% show up as unique visitors but fail to behave in a way that a human would, and 61% are directly identified as bots (using the system I have for identifying them).

Many analytics systems are pretty good at filtering out most of the 'bad traffic'. For instance, tools like Google Analytics or Adobe Analytics have built-in systems for detecting and removing bot traffic. Adobe Analytics, for instance, is using the IAB/ABC International Spiders and Bots List(but there are plenty of open-source and free lists available). And GA/Adobe also filters out all the bots that don't activate your scripts.

The problem is that these systems only identify and filter out bots that are marked as such. They can't detect traffic that looks legitimate but isn't.

This is obviously not a new problem, but, as publishers spend more and more resources on their own data and analytics capabilities, understanding how big an impact this has is vital to your future plans.

For instance, just last week, we heard about how Hearst is opening a 20-person data studio, focusing on bringing their '1st-party data' to advertisers. But are they filtering out all the invalid data?

I don't know how they work, but I can tell that most publishers don't filter out the data correctly. Not because they don't want to, but because they don't realize just how much of their internal (1st party) data is filled with invalid views.

So, in this somewhat technical article, let's talk about how to accurately identify real traffic, and why you really need to look at your data in stages.

I'm not going to show you any coding, but we need to talk about how the internet works.

How bad is it really?

One of the amazing things about doing your own analytics is that you have access to the raw data. This means that you can do a lot of detailed analysis that you can't really do with an external service.

One thing that I do, for instance, is to output a list of 'user-agents' (which is what identifies which browser/device people are using), and then compare it to what actually happens on my site.

This 24 page report is exclusive for subscribers. (login)

Try it free for one week

Register to try out Baekdal Plus completely for free for one week.

for just...
for just...
You get two months for free


Baekdal Plus is your premium destination for trends and analysis for the media industry. Every year you get 25 reports about the future media trends, business and editorial strategies, monetization analysis and insights about how to use analytics specifically for publishers.

As a subscriber, you also get full access to all the Plus reports (more than 200) published over the past 8 years, as well as the ability to share what you read.

I'm a company, can we pay via an invoice?

Yes, of course, please write to and I will send you a regular invoice that you can pay via your bank. I will need your company name, address and VAT number (if within the EU). Also, please note that due to this process being manual, this will be for an annual subscription only.

Is there an Enterprise Plan?

Yes, please write to for details. But for 25-99 users: the price is 20% off the subscription price ($79/year per user), 100+ users is a fixed price at $5,000 (for all combined).

Can you create a report just for us?

Yes, please head over to Baekdal Media to read about consulting where I can help you with strategy reviews, trend and strategy reports, and strategic guidance for you media company or a specific publication.



The Baekdal Plus Newsletter is the best way to be notified about the latest media reports, but it also comes with extra insights.

Get the newsletter

Thomas Baekdal

Founder, media analyst, author, and publisher. Follow on Twitter

"Thomas Baekdal is one of Scandinavia's most sought-after experts in the digitization of media companies. He has made ​​himself known for his analysis of how digitization has changed the way we consume media."
Swedish business magazine, Resumé


—   analytics   —


Creating a propensity model for publishers


How my focus on analytics has changed as an independent publisher


How can publishers measure trust and other editorial metrics?


A guide to analytics for independent journalists


Why producing less news leads to a boost in subscriptions


GDPR: How publishers can track things without tracking people