If you follow me on Twitter, you will have noticed that I have talked a lot about what I was doing to get GDPR compliant, including some pretty crazy stuff ...like deleting all trackers and cookies from my site.
This has generated some funny reactions, especially from publishers and friends who work with data, because how can I do all the things I used to do if I don't have any data? How do I do analytics if I don't have the Google Analytics script running?
The reality is that I do have all of those things. I just massively reengineered it to work differently to other sites ... just to go 'all-in' in the name of privacy and GDPR compliance.
I have already written extensively about GDPR in previous articles, but in short, GDPR is the first rule of privacy put into law.
This was something I wrote about all the way back in 2010. The first rule of privacy is very simple. It goes like this:
Just stop and think about this for a moment. Think how many times we in the media industry violate this simple rule.
For instance, if I go to a newspaper to read an article about something, and the newspaper has the Facebook tracking pixel installed, you are violating this simple rule.
I'm sharing my presence with the newspaper, but at no point did I agree to share my behavior and interest with Facebook. It's the same thing with ad code. I don't have a problem seeing an ad on the newspaper itself, but I never gave my permission for you to give that data to someone else.
GDPR is now putting a stop to this by turning the first rule of privacy into law. On top of this, one of the big differences that GDPR makes is that the implied consent that we have always had is no longer enough. Compared to the past, we now have to be in control of the data, which we weren't; We need to have data transparency, which we didn't; We have to be able to show our audience exactly what data we have collected about them and where it is, which we couldn't ... and we have to give readers a way to delete their data.
So, in this rather long article, I'm going to tell you exactly what I did, why I did it, how I implemented my new system, as well as how I am now doing analytics.
Note: This article is specific to my site. If you want to learn what I recommend publishers do in general, I wrote about that in: "Putting GDPR into Action for Publishers".
This was what I did:
The first step was to get a detailed picture of just how much I would need to change, and to do this I did a business-wide data audit. Let me just summarize what I found before I talk about what I changed.
Discovering what data I was using was very simple because I have coded this site myself, so I know where everything is, how everything works, and exactly what data I collect.
What I realized though was that my internal analytic system would never be compliant with GDPR, because it was explicitly built from a 'person-first' approach. This meant that every single piece of data was linked to a specific person, before it could be added to the database.
So, I had to completely change that.
Next was my payment system. I'm using a mix of PayPal (2010-2017) and Stripe (2018-) to manage all payments. These aren't really a problem, because they already give me full control over my subscriptions, and they are only activated when people specifically choose to subscribe.
So, this was generally fine. I just needed to make sure that these services were themselves compliant with GDPR, that the data shared from my site would be limited to my account, and that's basically it.
Then came 3rd party tools, like 3rd party services that I had added, embedded content, and similar.
All of these were violating GDPR in every way possible. They were tracking people for more than just the service they provided, there was no consent, I had no control over the data, etc.
This was bad, so here I had to come up with a completely new plan.
Finally, I looked at my external analytics, where I was using Chartbeat and Google Analytics.
For Chartbeat I decided to simply delete it from my site, not because they are really doing anything wrong, but because my internal analytics provides me with a more accurate view of activity per article than Chartbeat does.
For Google Analytics it was a bit trickier. They have done a lot of things to become GDPR compliant. And, as far as I know, they are now offering a fully-compliant way to implement it, where you can change how much data they get depending on the level of consent you have for each user.
The problem for me, however, is that because of the way I have implemented my Plus articles, there is no way for me to load Google Analytics client-side without also sending personally identifiable information. The reason for this is a bit complicated, but it links to the way I have implemented sharing information directly into the URLs.
So for most other sites, using Google Analytics (when implemented correctly) isn't really a problem in terms of GDPR (again, as far as I know), but I had to come up with another solution.
So, this was my audit.
Of course, the one thing missing here is advertising, but since I don't have ads on Baekdal Plus, I didn't have to worry about that at all. But I wrote about how publishers could deal with that here.
Okay... so what now? How do we fix this? What strategic decisions did I make (and why did I make them)?
One way of fixing all of this very easily is just to ask people to give you their consent. This is the approach that I have seen every other publisher focus on, but it's also a really bad idea.
What other publishers talk about is doing it like this:
Here you specifically ask people for their consent, and then people can pick and choose exactly what they are willing to give their consent to.
So, isn't this a good idea, you ask?
No. It's a terrible idea. In fact, this is the worst thing you can possibly do. And the reason is because of your 'conversion funnel'.
Anyone who has ever worked with analytics knows that the more steps you add to a process, the more people you lose before the end of your conversion funnel.
For instance, here is an example from the 'checkout' experience for a web shop.
Look at how many people they lost with each step. They started with 5,919 people who went to their checkout cart, but they ended up with only 849 who actually bought something.
They lost 86% !!!
This is just one example, but you can go to Google Image Search and look for 'google analytics checkout funnel' and you'll find thousands of similar examples.
So, adding steps to your user experience is the single worst thing you can do, but this is exactly what publishers are now planning to do with GDPR.
But it gets even worse, because with the GDPR 'consent dialogs', you are asking people at the worst time possible to give you consent for something they don't really want to do.
You are also asking people in a way that interrupts what they are trying to do (they just want to read an article), before you have had any chance of proving that article is worth reading, before you have built up any momentum or in any other way been able to prove your value to them.
On top of this, you are asking if they will allow you to share their data to a ton of 3rd party trackers that mostly just show ads (which people hate more than anything).
This has to be the worst idea that the publishing industry has ever come up with. This is going to decimate your revenue growth, because why would you want this to be the first impression that people get when they come to you?
It's utterly insane!
And for this site, I have specific data to prove this.
Previously I had designed a new version of this site, which changed my subscription from a somewhat complicated PayPal checkout experience, to a fully integrated and much easier to use Stripe checkout system. The result of this 'increased usability' was that my subscription-rate has doubled.
So, I know first-hand just how incredibly important it is to give people an experience with as few interruptions and decisions as possible.
I also know that the only way I can get people to sign-up for a free trial and later subscribe to Baekdal Plus, is to build up momentum over time.
So, I need that initial introduction to my articles to be the best it can possibly be. This is why, for instance, I specifically designed Baekdal Plus to be shareable in a way that required no steps at all. Because this is critical to the 'amplification effect' of each shared article.
I don't want any extra steps on my site...but now you are telling me to add this really user-hostile 'consent dialog' so that I can send data to other companies.
Why would I ever do that? It would destroy everything I have worked for, which would be mind-bogglingly stupid!
So, no. There is no way in hell that I'm going to add a consent dialog to my site. I would rather not have any personal data, in exchange for being able to give people a better experience (which would result in better revenue growth from happy subscribers).
For me, the solution to GDPR was not to ask people for consent, but to rethink how everything works so that I wouldn't need to.
But how can I do that?
Well, I started doing the most extreme thing possible, which was to delete everything.
I completely disabled my internal analytics. I deleted the code that was setting any cookies, and I removed all 3rd party scripts, including things like Google Analytics.
Now I had a completely clean site, with no tracking of any kind. As EFF's privacy tracker reported: "No trackers detected. Hooray for privacy!"
Obviously, this is very nice from a privacy perspective, and I no longer have to worry about GDPR ... but it's also not very optimal.
As a media analyst, I make a living working with data, and with this I don't have any for my own site. That's not going to work.
The question now was: what could I do to get back some data, while still not having to worry about GDPR?
So let's look at each thing:
Cookies are usually used for two things. They can create a better user-experience, like automatically logging you in, which is a really important UX thing to do. And of course, they're also used for tracking (analytics, advertising etc.)
So what can I do here? Well the regulation states this:
Processing shall be lawful only if and to the extent that at least one of the following applies:
The data subject has given consent to the processing of his or her personal data for one or more specific purposes".
"Processing is necessary for the performance of a contract to which the data subject is party".
I can't do tracking or analytics, because that isn't part of 'a contract', nor is it really a specific purpose. But I can set a cookie for the sole reason of automatically logging people in (which is something people would expect when you subscribe). Logging people in would be a specific purpose, which is part of the 'consent' that people have given me when they choose to subscribe (the contract).
So, this is what I'm doing.
When you visit this site, a single cookie is always set, but the value of that cookie depends on what 'contract' I have with you. If you haven't subscribed, the cookie is set as 'baekdal=free', which I can't use for anything because I have no way of tracking people based on just the word 'free'.
But if you are a subscriber, I set a subscriber value that is then solely used for logging you in. The cookie is also used for system checks, like validating whether you are a person or a bot, but not in any way that tracks you.
So, we have now sorted out the cookies.
Next up is 3rd party embedded content.
Both Twitter and YouTube have created ways for you to embed their content in a privacy focused way. This means that you can continue to embed videos from YouTube or from Twitter, by just making small changes to your code.
For Twitter, you can find the details for that here. And for YouTube, you can find it in the 'embed video' dialog under "Enable privacy-enhanced mode.
In both cases, they no longer set/read cookies, which means that all their tracking is disabled.
This is great, right?
Well, technically (and legally) this works fine, but the problem is that all the ad/tracking blockers still wrongly detect this as tracking. For instance if you use EFF's tracking tool, it will show you this when you embed a tweet from Twitter even when set to non-tracking.
This is a bit annoying. Twitter isn't tracking anything, and I'm telling my readers that I don't track anything, but the privacy extensions that many people have installed in their browsers say that I do.
This makes me look bad, so could I do something to fix this?
Well, for Twitter, I have long stopped embedding tweets, because it adds a bunch of other problems (including that people might delete their tweets). So from a purely journalistic perspective, what I do instead is to take a screenshot of the tweet and then I post that with a link to the tweet instead.
To give you a simple example, here is a tweet from @MKBHD, with a screenshot of a tweet from Gal Gadot getting caught endorsing a Huawei phone from her iPhone.
Gal Gadot has since deleted her tweet (and posted a new one), but this doesn't mean that we don't have the right to write about it.
So I always post screenshots of tweets, which also then solves the problem with tracking.
For YouTube, it's slightly more complicated. You can't just screenshot a video, and because a video is much longer than a tweet, we have a much bigger problem with things like copyright and the extra cost of bandwidth.
So, is there a way for me to embed a video so that it won't load anything from YouTube until people explicitly ask for it?
The short answer is yes.
What I have done on this site, is created a system where, when you see an article with an embedded YouTube video, the video doesn't load until you click on it. Instead, the only thing you see is a picture with a play button.
Like this (don't click on it yet):
If you have installed a privacy tracker, it will still tell you that there are zero trackers on this site because, until you click, it's just a picture.
I then chose to add the wording "Play this from YouTube" to make it clear to people that when they do click on this, the video is then loaded from YouTube and will start playing.
This way I am again in the clear for GDPR. When you click, it is an explicit consent to play that video, as part-fulfilling the 'contract' you are asking me to deliver.
(BTW: Now you can click to see how it works).
I personally really like this solution. All sites should do this.
As for sharing buttons, I didn't really have to think about it, because on this site there are none. Today almost everyone just uses the built-in sharing buttons on their phones, or they simply copy/paste the url. So, there is very little need to add your own sharing buttons to your site.
But even if you did, you don't need to add any 3rd party code to your site to do sharing. Here, for instance, is The Guardian, where the sharing buttons are just a small icon (a picture) with a link.
This works perfectly, and since no code is added, it's fully GDPR compliant by default. Just implement this correctly, and you are good to go.
But now we come to a big one...
As I mentioned earlier, I had a specific problem with analytics because of the way I am implementing the Baekdal Plus 'paygate' and the way my internal analytics was structured.
Let me quickly explain what I mean by this. Almost all forms of analytics are based on measuring every time people see a page (a hit), which is then added to the database along with any extra information that you might have.
This, for instance, is how Google Analytics works.
What I did instead was to ignore these 'hits' and measure events and behaviors in relation to pages, people (and if I had multiple journalists, I would have a separate segment called 'writers').
I did this because of something called 'scored analytics' (and partly 'learning analytics'). I don't care how many pageviews I had last month, because such a generalized metric offers me no actual insight.
What I do care about is, how well does each specific reader interact and engage with Baekdal Plus? How well is a specific article performing? And (again, if I was a larger publisher), how well does each journalist perform?
So all my analytics were specifically targeting building up a profile around either the reader, the article, or the writer... depending on the events and behaviors that I detected, rather than just measuring views.
Another thing I did differently was that I didn't measure time. In regular analytics, everything is based on a time-period. For instance, you can see what articles had the most views last week, or what the bounce rate was last month.
The problem is that this is not really useful. It's far more useful to look at the 'score' each thing has.
For instance, in my old analytics, there was no way for me to see how much traffic I had last month. But what I did know was exactly how each specific article performed as a whole and how loyal and engaged each subscriber really was.
This was far more relevant than what had happened over a specific timeframe.
On top of all of this I also used Google Analytics, because GA is great for all the generalized stuff.
So what is the problem here?
Well, with GDPR, you are not allowed to do personally identifiable tracking/profiling without explicit consent. Keep in mind that we are not just talking about subscribers here. We are also talking about the tracking ID that you set for anonymous users. GDPR doesn't allow you to track people, regardless of how you identify them.
Oh ... crap!
Before, I could do all of this because there was an implicit consent that I was allowed to measure what my readers did when they visited my site. But now, I need explicit consent.
So, I needed a new plan.
At this point we need to talk about how complex this really is and what GDPR isn't doing. GDPR is only about personally identifiable information. It's not about data in general.
You can measure everything you want as long as you don't link it to a person.
Let me give you an example. Imagine you are running a grocery store and this happens:
Someone that we don't know comes into your store to buy 2 packs of diapers, 15 pieces of assorted fruit, some low-fat ground beef, two liters of rice milk, and a bottle of makeup remover.
All of this data is something you are fully allowed to measure, because it's just data. And you can also do some pretty fancy analysis to expand this data to know even more about this person.
Here, we don't know who the person is, and we have recorded what happened. And then based on this, we have done some additional analysis.
For instance, because of the diapers we can assume that this person is a parent of a small child, which also gives us a probabilistic age between 25-35.
From the fruits and the low-fat meat we can determine this person is health focused (or maybe trying to lose weight).
From the rice milk, we might think this person is a vegan, but then we remember that she also bought meat, so it's more likely that she is allergic to milk.
And finally, from the time of day, we might determine that this person is probably on maternity leave, because why else would she be shopping at 11 AM?
This is what learning analytics is all about. Instead of just collecting metrics and putting them into a dashboard, you build up actual insights based on the data that you have.
Where GDPR comes into effect is when you want to measure this per person. But it actually gets worse than that, because, a part of GDPR is that you can only use data limited to what the person is doing.
Art. 5.c: Personal data shall be [...] adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed ('data minimisation');
This means that even if you do have consent, you can't just add data that the person didn't give you. It has to be limited and relevant to what that person is doing.
In practical terms, this means that when you get consent you are allowed to associate the actual data you have with a specific person. But all that extra data you added because you were doing fancy analysis cannot be considered 'limited and necessary' from a GDPR perspective.
In other words, we get this:
This is the problem with GDPR and analytics. You have to make a choice.
You can either ask for consent, which gives the ability to associate the actual metrics that you have, and allows you to track people over time ... which is really useful for a lot of things.
Or... you can choose to not track people individually, which opens your ability to do more fancy forms of analytics, like learning analytics, audience intelligence, and so forth.
Of course, what you could do is to create two separate analytics systems. One that is based on people (and needs consent) and another that is based on intelligence (anonymized aggregated data). But you can't do both at the same time.
This is also why we currently see a lot of US companies saying that they are no longer going to operate for people in Europe. Their entire business model is based around linking personally identifiable information with data people didn't explicitly give them, which is not directly relevant to the service they got in return.
They are currently asking people to give them consent for everything, but they can't ask for that because GDPR requires the data to be limited ... aka 'data minimisation'. And the EU is already preparing itself to go after Facebook if they don't get their act together before May 25th.
This is a really big thing to understand about GDPR. When people give you their consent, you can match that to the actual specific data that they gave you in relation to the service they used. All your other data can't be associated to a person, even if you do have consent.
So, this is the principle that I have now built my new analytics system around. I now have three separate databases.
The first one is the 'accounting/subscriber' database. This is the database that holds the identity, account, and payment information for all subscribers.
This is data that I already have consent for because, when a person subscribes, they are asking me to create an account, and log them in. Also, this data is something I legally have to store, because the EU requires it in relation to doing taxes and other forms of accounting.
Then I have a completely separate database that measures what you would call normal analytics. This records what articles have been seen, from what browser or device (user agent), from what country, the referral and sharing data and other events. It also contains the most important metric of all, which is my 'read-rate' data, where I measure how much of a page was read, how long it took, and whether it was read all the way to the end.
On top of this, I also use the subscriber database to verify whether each visit is from a verified person or not. The way this works is that it looks at whether the subscription system has identified you or not, and if it has, it will tell the normal analytics system that this is a 'verified view'.
Mind you, I'm not setting any personally identifiable data with this. It's simply a number. It's set to '1' if it's verified, and '0' if it's not. There is no tracking taking place.
The normal analytics (with completely anonymous data) is then also sent to Google Analytics, which I'm now doing using 'Universal Analytics' from the server instead of from the client (nothing is sent to Google from people's browsers).
Finally, I have a third system (which I haven't finished building yet). This is the 'data intelligence' part of the system, where I 'score' each article in order to better help me understand the patterns that the normal analytics can't provide. This is basically the system that I used to have, but now it's only using aggregated data. There is no way to take the input/output from the 'data intelligence' system and match that back to a specific person.
With this I now have a pretty robust analytics system that is fully compliant with GDPR ... without having to ask new visitors for consent upfront, because no personally identifying data is stored until you actually subscribe, and even then, none of that is matched to the analytics data.
So... that's cool.
The problem is that, since I'm not asking for consent, I don't have any way of doing any form of subscriber analytics. I can't see what articles a specific subscriber reads, I can't do churn-rate optimization, I can't track 'momentum' in subscriber engagement over time.
This is a pretty big problem, because this information is a critical part of being able to focus my editorial strategy and to know what is actually relevant for my readers ... and I need some way to get this back.
But the question is, do I need this data for everyone? Or would I be able to get the same level of insight if I only had it for, maybe 5%, of my subscribers?
So, I did a test. I looked at my old analytics system and I picked out about 5% of the audience. I then looked at the patterns for this sample group and compared it to the patterns I could see as a whole ... and while there were some variations, they were pretty close.
This is great because it means that I don't need consent for everyone. I just need it for a small sample group ... but then comes the question of how to actually do this?
As I mentioned earlier, there is no way that I'm going to ask people for consent to tracking before they have converted into subscribers, because that would likely have a very negative impact on subscription rates and revenue growth.
But is there another time when I could ask for consent without any negative effects?
And yes, there is. I have identified two such points in time.
The first one is immediately after people have subscribed.
After people have decided that they want to become a subscriber, after they have paid, and after their account has been created, I show people a 'Welcome to Baekdal Plus' page, where I always ask people if they want to get my newsletter.
Now I also ask them about analytics, like this:
This is a much, much better way of asking for consent. It doesn't interrupt people at the wrong time. Instead, it's part of a decision that people are already making, after the most critical decision (the decision to pay) has already been made.
But this is not the only place I ask for consent. Another point where it made sense to do this is when people have already proven to be dedicated readers, and after they have already enjoyed reading an article.
When is that? Well, to figure this out I can look at my subscription database, where I can see how long ago a person subscribed. So for monthly subscribers, I set the limit at 6 months and 15 days. In other words, if people have renewed their subscription 6 times, that's a pretty good sign that they are probably going to stick around.
For yearly subscribers I set the limit to 1 year (one renewal) and 2 months. Why 2 extra months? Because some people churn immediately after a renewal, so I wanted to add enough buffer so that I didn't ask for consent from people who are already about to leave. Because if I do that, they will definitely leave.
And this consent box looks like this (remember, it's only displayed at the end of an article. It's not a pop-up, nor does it interrupt people in any way).
Again, this is a much better way of getting consent. I'm only asking people who are dedicated readers, and I'm only asking at a point in time and place where it doesn't feel like a burden.
And remember, the goal here is not to get everyone to give me consent, but only to get about 5% to do it.
It's not perfect, but it's better than not having anything. And it's a million times better than asking everyone for consent before they have even converted.
That's analytics sorted!
So, are we done?
No... there is one more thing we need to do. We need to redesign the privacy page so that people have transparency, control, and the ability to delete their data.
And here I did something pretty special as well.
What's really interesting about GDPR, though, is that it specifically states that this is no longer acceptable. Privacy pages must be written using clear and plain language, and, if you don't, the consent is no longer binding.
The principle of transparency requires that any information addressed to the public or to the data subject be concise, easily accessible and easy to understand, and that clear and plain language and, additionally, where appropriate, visualisation be used. [...] Any part of such a declaration which constitutes an infringement of this Regulation shall not be binding.
The same applies to data transparency and control. Not only does this have to be presented in a very simple and clear way to be valid, you also need to give people control over it.
In other words, I can create a privacy page just for you!
This is what I did. My privacy page now dynamically changes according to who you are, what relation you have with this site, what data I actually have about you, and the level of consent you have given.
Every page will be different for every person, but overall this means I now have 5 different pages (but you are always only seeing the one that applies to you).
The above are links to screenshots of all these pages so that you can see the exact difference between each. But let me just quickly give you some highlights.
If you are a free/first time user, as I illustrated above, I'm not collecting any personally identifiable information ... so the privacy page just says this:
Isn't that wonderful?
But let's take a look at the other extreme. Imagine that you are a subscriber and that you have given consent to tracking, now the page updates to show you the real-time data that I have about you.
That's right. The real-time data... directly from the databases. Let me give you an example from my account.
First, if you are a subscriber, the privacy page now lists all the data that I have directly from the subscription database. For my account this looks like this:
Obviously, I'm hiding the sensitive data like the unique ID or people's password (nor can I even show that because it's encrypted), but this gives you an overview of the account information I have.
As you can see, I have allowed tracking (GDPR consent status = allowed), so for this account, I'm also doing personally identifiable analytics. And this is where it gets really cool because, when you scroll down the privacy page, you will see this:
This is a direct output from the analytics database. It's updated in real-time, and it shows you exactly what was measured that you looked at, read, shared, but also the amplification effect of that sharing.
For instance, at the top of the page you can see that I looked at an article, which I read, and shared ... and then, because of that sharing 86 other people came to look at it.
You can see what I can see!
Of course you can also choose to withdraw your consent and delete your data. This can be done via a button just below the analytics data, and it looks like this.
Again, when you click this button, the data is simply deleted. There is no 'administration'. You don't have to fill out a form or send me an email. When you click this button, the personally identifiable data is just deleted... instantly!
Also, if you cancel your account, or if your account expires in any other way, this will automatically delete the data.
It's that simple.
BTW: If you want to see this for yourself, you first have to enable tracking (obviously). But you can do this by clicking on this link. Please note, this can only be done if you are a subscriber. Non-subscribers are never tracked even if they allowed it.
This is how I have implemented GDPR compliance on Baekdal Plus. As you can see, I went kind of crazy with it and took things to the absolute extreme.
There are two reasons why I took this so far:
First, the overall trend around privacy is pretty simple. We can all see where things are heading and how the first rule of privacy will soon become the norm.
Because of this, I didn't want to wait to see what happened. I wanted to get ahead of the game. Also, everything I do on Baekdal Plus is about helping you to do things better. So, I would look rather foolish if I didn't do what I preach.
Secondly, as a media analyst, it's important for me to experiment and to push the boundaries. As such, this site is a way for me to experiment and see, using real data and a real audience, how things work (or break).
There are also a lot of things I didn't talk about here. Not in relation to privacy or GDPR specifically, but in relation to editorial strategies and audience development.
For instance, what do you do if you want more data? One answer might be to think about building new services, where people would want to give you the data because it helps them in return.
In the grocery example above, you could offer people an 'allergy free' shopping experience if people just gave you information on what they were allergic to... which would be really useful to many people.
As publishers, we need to think about data in the same way. What can we do to make people want to share something with us? But this is a topic for another time.
For now, let me end with this:
Founder, media analyst, author, and publisher. Follow on Twitter
"Thomas Baekdal is one of Scandinavia's most sought-after experts in the digitization of media companies. He has made himself known for his analysis of how digitization has changed the way we consume media."
Swedish business magazine, Resumé