Sorry, we could not find the combination you entered »
Please enter your email and we will send you an email where you can pick a new password.
Reset password:


Plus Report - By Thomas Baekdal - October 2017

Publishers, You Need 'What Should Happen Next?' Analytics

Shared by Plus subscriber
Claes+Holtzmann Senger
This is Baekdal Plus content. It is shared with you for free by a member. Please reshare it.

If you have been following Baekdal Plus for a while, you will know that I often talk about the next generation of analytics, like learning analytics, predictive analytics, scored analytics and so forth.

But you might also have noticed that I'm increasingly redefining how we think about analytics. For instance, last month I wrote about something I call 'editorial analytics', where the starting point isn't the data itself, but how we are using analytics to define an editorial focus.

This is so powerful because it helps you understand what you want to know, and then you design your analytics around that. And if you do that around an editorial goal, your analytics suddenly become a tool for journalists to use, rather than just some random numbers in a dashboard.

If you haven't yet read "How Editorial Analytics can Help you Define your Editorial Strategy", I strongly encourage you to do so.

In this article we will do something similar but I will talk about another part of analytics. We are going to talk about something I call 'what should happen next' analytics".

In many ways this is another name for predictive analytics, where you use massive amounts of data to predict things, but it's also an extension to scored analytics, where you assign a value to different interactions in order to understand how much of an impact your articles really have.

But what makes 'what should happen next' analytics special is that it's also a mind game that will help you understand the actions of your readers better.

So, let's dive in!

First, let's talk about why we need a different type of analytics.

The digital world has a problem

The reason analytics (and algorithms in general) are often so hard to get right is because we have a problem in the digital world, and the problem is lack of intent.

There are generally two things that keep an industry healthy. One is the level of competition, which will constantly push companies to create products that are better than their competitors'. And the other is the need of the customers (their intent), which defines the level of quality that people expect a company to deliver.

We see this, for instance, when we look at the footwear industry, with companies like Nike.

Nike create good products because of these two factors. The pressure from their competitors forces them to always be a little better, but the real power comes from the need of customers.

When people go out to buy a new pair of sneakers, they do so because they are looking for something specific. When you are looking for a pair of shoes for running, you expect a particular level of design, comfort and performance. And before you buy anything, you make sure you make the right choice by weighing all the options.

In other words, there is a deliberate and focused intent.

Because of this, Nike is forced to create good products. If they just made crappy shoes, they would quickly lose their market because this 'intent' would eliminate them as a choice.

Or to put it in business terms, the competition forces you to continually optimize your business, while the intent of your audience, forces you to focus this optimization on quality.

If you take either of these things away that's when things start to go wrong. Without competition, you end up with very expensive and not very good products. And without intent, you end up with products that are basically just low-intent crap.

So what does this have to do with the media industry?

In the media industry, we are often faced with the problem that one of these two factors is missing, which leads to some pretty bad results, and this is especially true online.

In the digital world, almost all media consumption is the result of people having no intent, while the level of competition is just crazy.

The result of this is pretty scary, because it has led to a very long list of bad behaviors and bad content online.

We see this absolutely everywhere.

Every time publishers talk about optimizing for digital, they do so without considering an intent, and the result is articles that, frankly, aren't worth reading.

We also see this with social channels, where the lack of intent forces everyone to optimize for content as a snack. And the effect of this is devastating the media industry.

Almost every publisher has been forced to decimate their value to produce more content for people who don't really care to begin with.

And this has a pretty big effect on the analytics we use.

The problem is that if you apply analytics to a market that lacks intent, the algorithms will automatically start to optimize for the wrong things.

Just look at Facebook. Every time publishers start to look at analytics to make Facebook work better for them, the result is a product that is less valuable than what they started with.

The same is true for advertising.

Here is a simple question: Which one is better: More clicks or better clicks?

The answer depends entirely on the intent of the audience. If the audience has no intent and is basically just browsing around while bored, more clicks provide a more valuable result than better clicks.

This is the maddening reality publishers face each day. Most of the publishers I talk with dream about being able to do more premium high-value advertising, but you can't do that if your audience doesn't have a specific intent.

If people are just bored and clicking on links at random, it doesn't matter how good your articles are, when they see the ad on your site, they don't care, because they have no need to fill.

So, the key to doing premium advertising is to first change your audience's intent.

But where this really comes into effect is when we talk about subscriptions. Having a low-intent audience wreaks havoc with any subscription strategy, and your analytics often won't help you either. The reason is that every data point you have tells you that you need to optimize for more low-intent, because those are the biggest numbers.

Low-intent drives more clicks, gets you more traffic, increases engagement, creates more social sharing, and reduces bounce rates.

You see the problem here?

All our metrics tell us that we will get a better result focusing on the low-intent... and yet, when we do so people stop subscribing.

So, we need a different type of analytics, specifically one that doesn't encourage to optimize for low-intent and which isn't defined around single metrics.

We need a type of analytics that can understand the momentum that happens over time.

For instance, it doesn't matter that someone isn't reading an article today if they still end up coming back tomorrow. But if someone reads an article today and doesn't come back... that's bad.

And this is where 'what should happen next' analytics comes in. Because with that we can define our metrics around an outcome.

Making 'What should happen next' analytics work

The way to think about these analytics is to look at what you want at the end.

For publishers, this is usually fairly simple to answer. Because what we want is to get people to subscribe, which is a very simple concept (but often very hard to achieve). This includes another goal which is to get people to resubscribe or, stop people from churning, depending on how your subscription flow works (automatic vs manual).

So we already know what should happen next.

When people see an article, what should happen is that they should (re)subscribe.

But how do we measure if we achieved these outcomes? The answer is to look at the pattern of what people do, and the easiest way to do that is to look at your 'known' audience. So go in and look at your existing subscribers, and then define that as your pattern for a successful outcome.

This is essentially how machine learning is able to do amazing things too.

Let me give you an example. The reason Google is now able to identify what is in a picture, is because they started out with a known set of pictures and then asked their algorithms to make a guess of what it was.

For instance, Google might show their computer the pictures below and then ask it to answer: "Is this a dog or a mop?"

The computer will then look at all the different data points it has, the color, spacing, the texture, etc... and attribute a value to each one. And at the end of the process it sums it all up and tells you this:

As you can see, it has correctly identified all the mops, but it did a pretty poor job at identifying the dogs.

So, Google's engineers tell the computer: "No, this was not the correct result. Try again!"

And then the computer will go back and readjust all the values it had set for different data points, adjusting some down and others up, and try again... and it will keep doing that until it gets an accurate result.

Google calls this AutoML (Automatic Machine Learning). Machines are able to run this process automatically.

And today the outcome of this is that, for things like image recognition, Google's AutoML is able to learn so effectively that it can outperform humans.

This is amazing.

But the key thing to understand here is that this is all possible because their test cases are based on known outcomes. When the computers learn to spot whether something is right or wrong, it's because we know what the outcome is supposed to be.

We can then take this concept and apply it to publishing, where we also know what the outcome is supposed to be.

We know who subscribed and who didn't. We know who resubscribed and who churned.

So, we can do the same to analytics.

If you want to understand what makes people resubscribe and what makes them churn, you don't look at pageviews or bounce rates. You look at all the interactions that person had, and from that you determine the pattern that created that result.

And then when you have trained your system to spot this pattern (across all your known sets of data), you can start to apply it to your unknown data.

For instance, if you know what pattern makes people subscribe, you can compare that to people who haven't yet resubscribed and get an indication about whether they are likely to do that or not.

You see how powerful a potential there is here?

Obviously, in the future this will be done by AutoML or similar technologies, but you don't have to use machine learning for this. You can do this manually too.

Let me give you an example:

Imagine that you want to know the difference between your loyal audience (those who resubscribe) and your disloyal audience (those who cancel).

What we need to do is build a model that we can then apply to each subscriber to see how things are going.

To build this, we first figure out what is important and what isn't. And the way you do this is take all the subscribers that you already know the outcome for. So, output a list of all the people who cancelled over the past year, and all the people who have resubscribed.

Now we need to figure out what the difference in behavior is between these two groups. You can do this in many different ways, but I'm going to show you a very visual way of doing this with iconography as follows:

Here you see four different types of page interactions. People can either look at a non-article page, like the front page, they can visit an article (but not read it), start to read an article (but not finish it), or actually read it.

And then you have all these extra lines. You have a line coming in from the bottom indicating a source (in this case from Facebook). You have a line going straight up indicating sharing. You have an arrow pointing out indicating when they left. And there is the dotted line if people came back to finish what they started.

So what I want you to do is take a sample of each group of people (resubscribers and cancelled subscribers) and just map them out like this... for, say, the past six months.

So, here is an example for someone who resubscribed:

As you can see, there is a lot of pink (read) and blue (started reading). The source is often a newsletter (indicating an intent) and there are a few direct visits (white circles), and several examples of them coming back to finish what they started. You might also notice that there is no sharing, and no traffic from search.

Now we compare this to a person who cancelled their subscription:

What we see here is a different picture. Now there are far more misses, in that this person clicked on several articles, but never ended up reading them. We see that almost all the traffic originated from social channels (blue circles), and there is no traffic from the newsletter (likely because they aren't getting the newsletter). We also see that this person did start to read some things, but almost never finished.

And, just before they cancelled, we see a noticeable drop in engagement, with much longer durations between visits.

When you see this, it's not really surprising that this person stopped subscribing.

Obviously this is just one example, and I'm exaggerating a bit to illustrate it. In the real world it's not going to be as simple as this, but think about the pattern that you see here.

It's not that one single thing defines whether someone resubscribed or cancelled. It's the combination of signals that point to one direction or another.

And when you do this for every person coming to your site, you will notice a very big overlap.

Here is an example of multiple people combined:

What you see here is that read rates isn't a specific number, because some subscribers don't actually read that much, while some people who cancelled did. But overall, there is a kind of pattern.

And by doing this you suddenly have a much clearer picture of how behaviors change over time, and what that might lead to. And by looking at all these behaviors as a whole, you are able to predict changes to what people might do in the future.

Obviously, it's never going to be 100% accurate. There are so many things that can happen that might cause people to cancel their subscriptions. For instance, last month I had a longtime subscriber who cancelled his subscription because he got a new job that wasn't about media, and his Baekdal Plus account was paid for by the media company he'd worked for.

So, his pattern was perfect right up until where he was gone. And there are always examples like that, so it's never perfect.

But, as you can see, by thinking about analytics this way, you can get a much clearer picture of what it is exactly that works for you.

Focusing on what should happen

The really interesting element, however, is how we can use this to change things.

If we go to the examples from before, we saw this:

Here you can see that the importance of social compared with the newsletter is moving in an opposite direction. In this specific case, it's because the 'intent' has a big impact, since social media is often low-intent, and newsletters are often high-intent.

This helps you define what you need to do... not just for all your readers as a whole, but for every single reader on an individual basis.

For instance, if you see that a reader has a very low engagement with the newsletter, you might want do something to change that.

This is the power of 'what should happen next' analytics. It gives you the insight to figure out how to influence each individual reader.

But, remember, this is not about single metrics. What you really want to do is to look at this as a whole and try to get the overall pattern to match. It doesn't matter if one or more single metrics aren't perfect. Some subscribers, for instance, aren't into newsletters.

So look at the big picture. For instance, this reader (the pink lines) obviously needs some work compared to a successful pattern.

But also remember that this isn't about traditional metrics either. Look at metrics that tell you something about your editorial focus.

Let's look at for an example. This is a site that is covering a lot of different motorsport events around the world.

So, ask yourself this: How important is it for keeping people subscribed, that they watch several motorsports events rather than just a single type?

Look at your loyal subscribers and look at what they do. How many different types of motorsport events do they actually follow? Do they mostly just follow a single type (like Formula 1, WEC, WRC)?... Or do your loyal readers follow more than one?

Think about how important knowing this is to your subscription strategy. If your loyal subscribers mostly follow more than one type of motorsport event, the best subscription strategy might be to create a kind of 'all-access' plan, like Netflix. Whereas if people only follow specific events, the best subscription plan might be to offer people more specific choices, like a Formula 1 subscription plan, a WEC subscription plan, etc.

This simple approach can tell you so much about your strategy.

Or look at your journalists. Is there a pattern that defines loyal readers in relation to which articles from specific journalists they read?

Imagine you had this:

Here we see the read rate for each journalist for all your most valuable subscribers. Notice how Deidra Carlton really stands out, and is apparently a very important part of your subscription value. Also notice how Perry, Mark, and David aren't really that important.

The next thing to do is to look at the people who cancelled their subscription after only a single month, and look at which articles from which journalists they read.

...and you might see this:

Now you see that this group of people mostly didn't see Deidra's articles at all, so they never got exposed to the really valuable content. But they did see Mark's articles.

In other words, the people who cancelled after only one month mostly read Mark's articles, whereas the people who stayed loyal didn't.

That's not good.

So, something clearly needs to change here. But it might not be Mark's fault. It might be an editorial problem. Mark might have been hired to write 'socially optimized listicles', because the editors thought that would be a way to get more traffic from Facebook.

But as we can see, while this did kind of attract more traffic, it also led to people never staying loyal.

In other words, you know now what should happen next.

And there are so many other things that we can learn as well...

For instance, at the recent ONA17 event, Josh Schwartz from Chartbeat said that they had found the importance of the home page increased as people became more loyal.

This is kind of interesting, so is that also true for you? And if it is, what should you do about it?

If you see that your home page is viewed more by people who subscribe, and less by people who don't, this tells you a lot about what your home page should be optimized for.

It should not be optimized for random people doing random things (non-subscribers). Instead, it would be better if you optimized it to be a more valuable place for your subscribers to come back to.

For instance, since you know what your subscribers are likely to do (it's a known audience), maybe you want to make the home page more personalized? If you know that subscribers have shown repeated interest in a specific topic, maybe those news stories should be at the top? Knowing things like this can help create a better experience.

But again, the key here isn't any single thing. It's the pattern.

Is it really this simple?

As you can see, all of this sounds pretty fancy and it looks pretty straight forward. But is it really this simple? Can you just ask the person in charge of your editorial analytics to do this tomorrow?

No, of course not.

Everything I have shown in this article is a kind of stylized and simplistic version, in order to better illustrate this concept. For instance, when I illustrated read-rates, I showed you this very clean illustration, with a defined center average, and a clear difference.

In the real world, you are probably going to see something different, like this (or worse).

Notice that the variance between this is much lower, and there isn't just one type of behavior. When we look at the people who resubscribe now, we see that there are two different patterns. There is a low-pattern and a high-pattern.

So, doing this kind of analysis is not going to be super simple, because that's how the world works. But this is still a much better way of doing analytics than anything you are used to.

Another massive problem that you'll face when you try to do this is that you often don't have the right data. Standard analytics aren't designed to help you identify patterns of behavior, nor are they designed to analyse people.

Standard analytics are based on activity where the data is static from the point when it's recorded (you can't change it), and vital metrics like 'what person this data relates to' are only applied in those few cases where you have that information.

The result of this is that standard analytics are terrible at identifying what people really do, and especially terrible at building up a pattern over time.

Let me give you two simple examples:

The problem with 'users' is that standard analytics only measure this after people have logged in, because that's the only point when the site knows who someone is.

So imagine that you have a person who does this:

Here we have a person (me), who visits the front page of my site, clicks on a 'Plus' article (which I can only read if I have logged in), logs in, and is then returned to the same article, which I then read.

So what we have here is one person, visiting 2 pages (the front page and a specific article), with a read-rate of 100%.

But if I look at my regular analytics, it has instead recorded that an undefined person visited two pages and didn't read any of them. I also had an identified person who visited and read one page.

This is completely bonkers, but that's how normal analytics works. So if you are just using normal analytics, you have no way to build up a pattern of how people really behave.

What you need is analytics that you can update, so that once people login, you can go back to the data you recorded before that event, and assign the right user to that data as well.

Another example of this is read-rates, and this is something I have talked about many times before. If you just have simple articles, like those often seen in newspapers, you can measure this in the simple way of just looking at how people scroll down a page.

But as soon as your content gets slightly more complicated than this (like with longer articles, or articles that people can use), you start to notice a much more complicated pattern.

For instance, one of the things I often see with my subscribers is that it can take them several visits before they actually read an article.

For example, when a subscriber receives my newsletter, she might click on it just to see it, but she isn't reading it because it's not convenient at that time. And then later, she will come back to the same article, read parts of it, and then finish a day after that.

What you end up with is this:

The problem is that this doesn't represent how people actually behaved. While you technically had 3 pageviews, they all went to the same page. And since at only one of those visits, the reader actually ended up finishing the article, your analytics thinks this means you had a read rate of 33%... when it was really 100%.

It's simple things like this that make normal analytics so misleading, and there are hundreds of other examples like it.

And you might say that this is a special case that applies only to long content, but it isn't. Imagine that you have a fitness magazine where you have posted videos about how to do yoga.

How would you measure the read rate for that? Would you just measure based on how quickly people scrolled down the page (the standard way), or would you try to measure use?

For instance, if your fitness article talks about a 20-minute exercise that people can do at home, aren't people supposed to stay on that article for those 20 minutes while they use the article?

As publishers we need to think about all of this in a much smarter way, and normal analytics is not good at that. So, a big part of getting started with 'what should happen next' analytics, is to start collecting analytics data that fits you specifically.

Start by asking questions.

What is it that you want to know about your readers, their behavior, your journalists, the focus on your stories, etc? Then design your analytics to give you those answers in the way I described above.

And once you have this, you suddenly have a much better tool to help you figure out what you need to focus on, not based on some generalized view that every other publisher sees, but based on what you need to do specifically.

But the most important part of this article is that focus on the pattern, rather than a specific metric. The old way of doing analytics by focusing on optimizing for each specific metric doesn't work for publishers.

I don't care if you managed to boost your pageviews by 10%, if you got 30% more views on Facebook, or if your bounce rate was lowered by 5%. None of those metrics mean anything.

What I want to know is whether all the metrics combined put us closer to our ultimate goals. As I said in the beginning:

it doesn't matter that someone isn't reading an article today if they still end up coming back tomorrow. But if someone reads an article today and doesn't come back... that's bad.

Compare your patterns rather than your single metrics.


The Baekdal Plus Newsletter is the best way to be notified about the latest media reports, but it also comes with extra insights.

Get the newsletter

Thomas Baekdal

Founder, media analyst, author, and publisher. Follow on Twitter

"Thomas Baekdal is one of Scandinavia's most sought-after experts in the digitization of media companies. He has made ​​himself known for his analysis of how digitization has changed the way we consume media."
Swedish business magazine, Resumé


—   analytics   —


Advertising Analytics is from Mars; Subscriber Analytics is from Venus


Everyone Measures Conversions the Wrong Way. Let's Fix That!


How Small Publishers Should Think About Advanced Analytics


How Do You Identify Real People?


A Deep Dive into the Future of Subscriber Analytics


Fascinating Traffic Experiments by Publishers