We need to have a talk about what is tracking as a whole, what is good tracking and what is bad tracking. Every single time I write an article about ad-blocking, privacy or analytics, I always get feedback about bad tracking, with most people pointing to Google.
Now, I'm going to defend Google in this article. Google is no saint when it comes to tracking people through advertising. But there are a number of misconceptions that I keep hearing over and over again.
So let's go.
The most important misconception is that people say that 'Google is selling their data'. This is absolutely not true. Stop saying that!
But there is a bit of misconception going on as to what a data broker is. The definition of a broker is one that buys and sells information.
The keywords here are buying and selling.
Meaning that real data about you, and what you are doing must be sold to someone outside of Google, in a way so that they can read that data for themselves. But Google is NOT DOING THIS!
Stop thinking that they are. We have enough problems with the real data brokers without people confusing the situation even more.
But, wait-a-minute, you say. Isn't Google providing aggregated data to its partners, customers, researchers and consultants?
Yes, Google is doing that. That's no secret. But if you think that is bad, you are missing the key word here, which is that it is aggregated. It's not the raw data.
What does aggregated mean? Well let me give you a very simple and clear illustration. Imagine that you were tracking the sale for an ice cream stand. After ten customers had purchased via their credit-cards, you would know this:
In this raw form we have a potential for a violation of privacy, since we can see exactly what each person bought. For instance, we can see that Mia Green likes Strawberry. And if this ice cream stand started selling this data to others, that would be bad.
But this is the raw data.
Aggregated data, on the other hand, isn't anything like this. Instead, it is a grouped, often segmented result of the raw data.
For instance, with this data we could create 3 aggregated data outputs:
Here we see an aggregated output where, for instance, we can see that 60% of the customers are women, that strawberry is the product most people choose, followed by vanilla. But in terms of product quantity, chocolate is actually in 2nd place.
This type of data is immensely useful, because it gives insight into what people are doing as a group, but you also notice how different it is from the raw data.
With the raw data, we could learn that Mia Green likes strawberry ice cream, but with the aggregated data we have no idea what she likes. We do not even know that she was ever at the store.
There is no violation of privacy taking place with aggregated data. In fact, everyone shares aggregated data. Including you and me.
For instance, if I ask you how many visitors you had to your website last month, and you then tell me that you had 28,000 uniques, this is an aggregated number. But you are not violating any of your visitors privacy by revealing how much traffic you have in total.
Aggregated data is the opposite of individual data.
I get so frustrated every single time I read an article by a journalist, or hear comments from people thinking that this is bad and somehow violating their privacy. It's simply not true.
Now let's address the second massive misconception that people keep telling me about. It's that third party analytics tools, like Google Analytics, are a violation of your privacy because your data is sent to and stored by Google.
This is again not true. Or rather, it's not true by default. It is true that you can setup Google analytics so that data flows across sites and into their advertising system, but it's not how Google Analytics works by default. And it's not how Google Analytics works when I use it on this site.
Let me explain:
The way most people think Google Analytics works is like this. You have one site, like The Verge that is using Google Analytics, and then you have another site, like Samsung, that is also using Google Analytics.
Since both sites use Google Analytics, surely Google knows what you are doing on each site, and can then target you with annoying ads the next time you use Gmail. Right?
No. Just no!
By default, Google Analytics has absolutely no way of tracking you across sites, nor can it match what you do on one site with Google's advertising network.
The reason is that Google Analytics is a first party analytics system, and let me explain what that means.
Every single time we talk about bad tracking, we are always talking about 3rd party tracking, in that a third party is able to track what you do across multiple sites.
The way this works technically, is that an ID is set, by the third party, which is then loaded on every single site you visit (that uses the same system). Like this:
As you can see, with each site you are being tracked as 'user 17252', and the more you do the more they know about you. This can be a bit scary, especially for those advertising networks who also operate as data brokers, and are thus selling and buying data to fill in whatever gaps they have.
However, the only reason why this is possible is because these networks are using 3rd party trackers, often in the form of a cookie. It's that cookie that allows them to keep track of you as you move from one site to the next.
And we see this all over the internet. This is what I wrote about earlier this week when I illustrated how a site like iMore has implemented 106 third party trackers and services, all using third party cookies on their site.
Note: Just because someone is setting a third party cookie doesn't mean it's automatically bad. But, it is a problem when it comes to advertising networks.
But this is NOT how Google Analytics works, even though Google Analytics is a third party service.
Google Analytics uses a first party cookie, which means that each individual site will set their own unique cookie, just for that site. The result is that for every single site, Google Analytics thinks you are a different person.
And no matter how fancy you think the tech world is, there is no way for Google to know that user: 8362 is actually the same as user: 52782.
In the raw data file, all they have is this:
How would Google know that three of these numbers, are the same person, but not the other four?
This is why first party cookies are so important. They limit the tracking on a per-site basis. You cannot track people across sites with first party cookies. You need third party cookies to do that, and Google Analytics doesn't use those by default.
What's even more important is that Google itself cannot use the data either, because the data doesn't match.
Imagine Google wanted to link Google DoubleClick (its ad network) to what people do on The Verge. What DoubleClick would then do is to look into the Google Analytics database to see if it can match the IDs. And remember, our 3rd party cookie id was '17252'. But since no such match is found in the Google Analytics raw database, Google cannot link or use any of the data for advertising.
This is such an important thing to remember. By default, Google Analytics is site-specific, and Google cannot use or track you to other parts of Google, or between other sites.
Note: While I am talking about Google Analytics here, the same thing applies to many other third party services, like Chartbeat.
However, this doesn't mean that you can't enable it. Brands and publishers have the ability to turn on third party tracking with Google Analytics.
We see this, for instance, in the demographics report:
But you will notice that it says:
And if a brand or a publisher enables this, four things happen:
And the last item is the important one here. Because what this does, is that it links Google Analytics with Google DoubleClick, by adding the Google DoubleClick third party cookie.
Suddenly, Google Analytics can track you across sites, and Google can match you to the rest of Google and its advertising network. But it has to be enabled by each individual site. Just because it's enabled on The Verge, doesn't mean it's enabled on Samsung.com, even though they both use Google Analytics.
On this site, for instance, I have NOT enabled it. Partly because I'm a subscription-based site with no need for that type of integration, and partly because I don't feel it's acceptable for me to do so.
You see the difference? You see how just using Google Analytics doesn't mean your data is sent to Google for them to use as they please?
Note: You can check if it is enabled or not. Just look at what cookies a site is setting, and you can instantly spot whether 3rd party tracking is active or not.
This leads us to my third point, and the final big misconception. It isn't really a misconception as such, but we need to define what tracking is.
And to explain this, let's define tracking in levels of severity, discussing whether doing that is reasonable or invasive.
We will start out with the lowest level possible.
I'm surprised by how many people think this should be the norm on the internet. I'm also surprised by how many ad-blockers are not blocking analytics as well as ads.
It seems to be based on a completely wrong idea about what analytics is, and what is being tracked, based on some misconception about the how it works in the real world.
The argument I often hear goes something like this:
In the real world, when I visit a shop they have no idea who I am, or what I'm doing. And when I purchase something with cash, they can't track me either. That should be how the internet works. I should be completely anonymous and no tracking should be allowed at all.
They then use this argument for blocking analytics in their browsers.
The problem with this is that it's just not true. Let's use a small bakery as an example.
A local bakery is doing a shit load of analysis. For instance, it is constantly looking at which products people buy, when they buy them, what kind of feedback they are getting, and many other things.
They need this information in order to correctly estimate how much to bake each morning. And we don't have a problem with this. In fact, we expect them to do that because we expect them to have the right type of bread ready for us when we visit them.
Have you ever visited a bakery only to be told that they were out of morning bread? It sucks!
Of course, when I say the bakery is measuring all of these things, it's probably not using an analytics package. It's most likely based on much more basic tools, like taking notes and observing changes over time. But the concept is the same.
There is no such such thing as 'no tracking whatsoever'. Everyone is doing tracking on some form or another. And we expect, demand and recognize that companies have a right to know what is happening in their own stores.
Level zero is just stupid. A bakery obviously has the right to track how much bread it sells. It would be insane to suggest otherwise. Similarly, a website obviously has the right to know which pages are viewed.
So, let's talk about level one tracking. Here companies track what is happening on their sites, but they are not setting any cookies. This means that they cannot track people over time, and your visits will always appear to be your first one.
Again, many people believe this is how the internet should work. "Why should websites be allowed to track me over time?", they say.
But let's compare that with the bakery once more.
Imagine that you started buying your morning bread as a daily ritual. Maybe you are retired and just like to have that morning walk combined with the wonderfulness of freshly made bread. Maybe you drive past the bakery on your way to work, making buying bread the way you get breakfast every morning.
What happens in the bakery when you do that?
Well, the first time you visit the bakery, you are just a random person. But then as you keep coming back day after day, the store assistants in the bakery start to recognize you. They start to think, "Hey, that guy was here yesterday, and the day before .. in fact he seems to come here every day."
The reason why this happens is that we humans are setting a cookie with our brain. When we see someone, we remember it.
This is exactly the same thing that is happening online as well. When you visit a website, we see you through the website, and we remember you via the use of a cookie.
Now, again, you don't have a problem with this in the real world. If you frequently visit a store, you kind of expect that people will recognize you over time.
So the idea that cookies are bad is, of course, just as insane as the idea that we can do no tracking at all.
So, let's look at the scenario of companies using a first party cookie to remember who people are.
In this scenario you are still 100% anonymous, even though companies can now start to remember what you are doing, what you like, and how often you come back.
Again, think about the bakery. If you visit the same bakery every day, the people will want to learn what you like. They might notice that you always arrive around the same time each day, and that you always buy the same morning bread.
After a while, they might start to use this information to give you an even better experience. At first it's simple things like greeting you in a different way. But after a while, as they learn your habits, they might go further and pack your order before you even arrive.
Now they will say: "Ahh, good morning. We have your bread ready for you, just the way you like it."
It's the same if you have a favorite barista at your local coffee shop. You will walk up and he would say, 'The usual?' ... and you would go 'yep!'
So, is tracking people online using a first party cookie a good thing or a bad thing? Well, obviously it's a good thing. It's the very thing that enables us to deliver the excellent customer support that people expect from any great company.
Now let's take this a step further, and add data that isn't directly contributed to the website itself, but more to the interaction between a company and its customers.
Take the bakery store again. Just because you visit it each day, and just because they remember that you keep coming back and what products you like, doesn't mean they actually know you.
You are still anonymous as a person. They have no idea what your name is, where you live, what work you have, how much money you make, what hobbies you have, or what products you have bought in any other store. They only know what they can see you do within the bakery itself.
The web works the same way. With the use of a cookie, we can track what you see and how often you come back. But that doesn't give us any information about anything outside the website.
But, one way to expand this is if you voluntarily share that extra information about you.
For instance, if you visit the same bakery every day, it's more than likely that you start to talk with the people there. And during those short conversations, you might tell them things about you.
For instance, you might tell them what kind of job you have or that you live in that house at the end of the road, or some other information that they wouldn't know otherwise. And the people at bakery would remember it.
So is this a bad form of tracking?
No, of course not. This is what we call having a relationship. And the reason why it isn't bad is because it is you who decides what it is you want to share. They only have the information that you gave them.
You like the idea that they know this, otherwise you wouldn't have told them about it in the first place. You like that when you walk into the bakery they say, "Good morning Thomas!", rather than "Good morning ... uhm... person!".
However, we are starting to enter into a zone where we may have a potential problem.
Imagine that you have told your local bakery about all sorts of things, because you visit them every day, and you have become kind of friends with the people there. Then a week, later, you happen to visit another bakery in another city.
As soon as you walk through the door, the other people there say: "Hey Thomas. Do you want the usual?"
"Wait, what?" How the frak do they know what my 'usual' is? I have never been to this other bakery before.
Well, it turns out that this other bakery is actually a part of the same chain of bakeries, so anything you have told the people in your local store, is automatically made available to all the other stores as well.
You see the problem this creates? When it comes to information that we share about ourselves, we think about it not just in terms of what is being shared, but also who it is being shared with. And if there is a disconnect between who you think you are sharing something with, and who it's actually shared with, then we have a problem.
I didn't expect every bakery within the same chain to know about my life. I only expected my local store to know that.
We see this all the time online. For instance, this is constant debate around companies like Google, Microsoft and others. What you do in one part of Google is automatically part of every other part of Google.
Personally, I consider Microsoft or Google as one company, not several companies under the same name. So I have no problem with Microsoft combining the data they have about me when I'm using my XBOX, with the data they have about me using any other Microsoft service. It's all part of the same Microsoft. And I feel the same about Google. To me, there is only one Google, in which I use many different services.
But if this is not how you feel about things, I can understand why you would be surprised when your data is suddenly in more places than you expected.
Mind you, this is not a tracking problem, nor does this turn the act of tracking into something bad. This is a disconnect in terms of not realizing who you are interacting with. And every company should do their utmost to make that 100% clear at all times.
Now let's move into the more problematic forms of tracking. First we have the inclusion of external data, meaning data about you or your behavior that you didn't chose to share.
We will start of at the low end with referrals.
If you go into your analytics, you find a report that informs you what site people came from before coming to your site. It's the referral report.
From an analytics perspective, this is immensely useful, because it can help us understand things like the impact of social sharing, versus other referrals. Or whether that spike in traffic was caused by a general interest or just because a single and very popular site linked to yours.
But, it's also a bit problematic. Why should a website be allowed to know what I did before I arrived?
Think about the bakery once more. Why should the bakery be allowed to know that I visited the flower shop before going to the bakery?
Mind you, referral data only tracks actual referrals. It doesn't track if you are just visiting two different sites independently, as some seem to think.
We saw the same thing with search queries five years ago. Back then, search engines would include the search query with the referral data, so that website would know the exact phrase that people had been searching before coming to a site.
Today, of course, we don't have this anymore. Partly because of the privacy issue around that, and partly because it was being used in harmful ways by SEO companies. So Google and many other search engines blocked that data from being shared.
But if it's not okay to see search data about our visitors, why is it okay to see referral data?
You see the problem?
So this level of tracking is starting to get problematic. I wouldn't say that it is bad, as such, but it probably shouldn't be there. And if we look at the trends about people's reaction to privacy online, I think that referral data might be eliminated within the next 10-15 years, just as it was eliminated with search data.
Another much more problematic practice is when companies start to augment their internal analytics with data they have bought from data brokers.
One example of this is Target, the second largest retailer in the US. I don't know if they are still doing it, but a few years ago it was revealed that every single time you purchased something from them, they would match your credit card details with information they purchased from data brokers.
This would mean, for instance, that not only would they know what products you bought when you were shopping at Target (which is fine), but they would also know what products you bought in other stores. Even worse, the data they purchased, was reported to include far more sensitive things, like social status, health issues, and other deeply personal information.
This is a very bad form of tracking, and I'm surprised that it's even legal for them to do it. In fact, this very thing is illegal in many parts of Europe.
In my country, for instance, we have a 'Personal Data Protection Act' that says, among other things:
A company may not disclose data concerning a consumer to a third company for the purpose of marketing or use such data on behalf of a third company for this purpose, unless the consumer has given his explicit consent.
So, clearly the practice that we see mainly in the US about the use of data brokers to 'fill in the gaps' needs to be stopped!
Then we have the big one. The use of third party cookies that advertising networks use to track people across the web. As I already explained, the third party cookie allows third party partners to see what sites you visit and to use that to build up a profile about you.
This information can then be used for many things. It can be used for regular advertising targeting across sites, and it can be used for remarketing purposes, where what you do on one site reflects the type of ads you see on another site.
It's important to mention here, that often this data isn't actually being sold to anyone. Google's ad network tracks you on every site that uses it, but Google never sells that data. It's the same with Microsoft's ad network, and Facebook's, and Apple's iAds, and all the other ad networks whose names you may have never heard.
The problem here is who gets the data?
One example of this is Facebook. It is very well known that Facebook uses the Like button that is placed on pretty much every site online to track what you are doing. This means that if you frequent a site about drones, Facebook will collect that information and add it to its targeting profile about you.
This is then used so that when a company wants to place an ad for a new drone on Facebook, the ads will be targeted at you.
This isn't really done for any nefarious reasons, but why should Facebook be allowed to know what I do when I visit other sites?
It's the same with all the other ad networks. They all work this way.
From a trend perspective, this is the very problem that is partly causing the rise in ad-blockers. People have had enough of third party companies grabbing data and using it to boost their own business.
As I mention in my article about blocking, It's a huge problem. A site like iMore (popular site about Apple) is including 106 third party sites, many of which are ad networks that are collecting data that they can then use to sell ads.
This is clearly a bad form of tracking. If we look at the trends, this will either get blocked or made illegal by law in the future. This, of course, will have a huge impact on how these ad networks can operate.
Take Facebook. I have no problem with Facebook building a profile about who I am and what I do ON Facebook itself. Not do I have a problem with Facebook using that information to sell ads based on my interests ON Facebook.
The problem is when Facebook starts to collect data about what I do outside Facebook. I don't care that those sites have a Like button on their pages. I'm NOT on Facebook.
This is the difference we talk about with first party and third party cookies. Tracking with first party cookies is fine. Tracking me on other sites with third party cookies is not.
Mind you, I'm not saying we should block third party cookies. That is a stupid solution, almost as stupid as the EU cookie law (which doesn't distinguish between first party and third party cookies at all).
I'm saying that third parties should not be allowed to track what people are doing on sites outside their own.
Note: I'm writing another article about how ad networks can work in a 'no third party cookie world', so stay tuned.
Finally, we have the worst form of tracking of all, which is third party companies known as data brokers, who exist purely for the sake of buying and selling data about people.
This obviously shouldn't be allowed to exist. Personal data is not something other people can own, and the very idea that companies exist purely to profit on it is deeply offensive, and a massive violation of privacy.
In fact, these companies are already illegal in many parts of Europe.
There is a catch here, though. The catch is whether the information is public or private. You see, some of the largest data brokers in the world are companies that we don't normally think of as such.
I'm talking about the newspapers and magazines (especially the gossip magazines).
When a newspaper collects personal information to write a story (which is done for the purpose of making a profit), they are actually behaving exactly the same way as a data broker. They are taking information from one place and selling it to whoever wants to read about it.
And there are plenty of examples of where newspapers have crossed the line between public and private information. The Sony hack is a perfect example of private information that they had no right to sell as part of their newspapers, but many did so anyway. But it's very hard to know where the line is between public and private.
At the same time, public information is different, right? As well as information about the actions and doings of public officials.
Imagine that a newspaper came by an internal email that revealed that the CEO of Volkswagen not only knew about their cheating with diesel emissions, but had directly ordered it.
Note: I know of no such email, I'm just using it as a thought experiment.
Would you publish it in the newspaper? I would.
Mind you, the email is actually private information, but since it's linked to a question of public trust and public activity, it becomes fair game to write about it.
So, data brokers that exist for the purpose of selling data for marketing are clearly not acceptable, and should be made illegal. But data brokers that exist for the sake of journalism is... tricky.
Obviously, if it involves hacking into people's phones, like what Murdoch's papers did in the UK, that should clearly be against the law. But what if it involves a public scandal?
I hope this article has helped you see the many nuances that exist with tracking online, and helped remove many of the misconceptions that we hear about all the time.
There is no question that this whole area is in a state of transformation. The internet has operated much like the 'wild west' of data, and now we are now starting to develop some ethical ground rules for what we consider to be acceptable and what is not.
In terms of trend forecasting, it's pretty obvious that third party use of data will either be eliminated or dramatically blocked.
What's important though, is to remember that third party data isn't third party just because it's stored on another server. It's how it is being stored, and how it can be used that makes the difference.
As I mentioned, Google Analytics (and others) is a good example of this. It's the same when many startups are using Amazon Cloud (or other cloud services) to manage the data they have. Just because it's stored in Amazon's data centers doesn't mean Amazon can use it (or even have access to it).
It's so important that we understand and remember this distinction going forward. A world without the cloud, and a world without third party services isn't a very good one. It's only a part of that world that is a problem.
When we forget about this, we end up with bad solutions and stupid laws. Like when ad blockers block first party anonymous analytics and prevent sites from giving people a better customer experience.
Or the stupid EU cookie law that does nothing but annoy people without considering if there is anything to be annoyed about in the first place.
We need to have a smarter conversation about this.
Creating a propensity model is one of the most important tools publishers can have.
Many people say you can't measure trust. But you can, although before you do that, you first have to create trust.
When you are an independent publisher, analytics can sometimes be tricky because we don't enough data to work with.
Several publishers have found that reducing volume leads to an increase in revenue
The potential with machine learning is amazing, but it's not enough to identify a result. We also need to be able to do something about it.
Time is such a critical metric for publishers, but it's also a very complicated one.
When you are monetized by advertising, you tend to favor the least valuable metrics, but when you are focusing on subscriptions that changes to the most valuable metric.
Everyone talks about conversion rates, but that often doesn't tell you anything about how well you are converting people. Let's talk about conversion value.
Many large publishers are now turning to advanced analytics to understand their audiences, but what if you are not a big publisher? Can you still do it?
Publishers who start their own data studios need to take extra steps to identify real people.
Founder, media analyst, author, and publisher. Follow on Twitter
"Thomas Baekdal is one of Scandinavia's most sought-after experts in the digitization of media companies. He has made himself known for his analysis of how digitization has changed the way we consume media."
Swedish business magazine, Resumé