Reset password:

Strategic insights
The Future of Analytics And The Trend of Demonstrable Causation

The Trend of Demonstrable Causation is completely changing everything we used to know. It is invalidating the very foundation of which we today measure interactions. You cannot fix this by adding features.

27
PAGES

FREE FOR SUBSCRIBERS

Written by on July 12, 2012

Shared By Plus Subscriber

Olivier Lendresse

READ ALL THE PLUS REPORTS

This is Baekdal Plus content. It is shared with you for free by a member. Please reshare it.

There are two extremely important trends happening in the world of analytics. The first one is the Trend of Demonstrable Causation. The analytics world has realized that it simply isn't good enough just to look at an activity between points. Call it the no-bullshit-analytics, where each result must be based on real people, doing real things, and for real reasons.

We see this trend all around us. In Google Analytics we now have incredibly advanced tools around conversion funnels and goal tracking ...and even more fun when it comes to custom reports. And analytic tools like KissMetric take it a step further.

We are starting to see the same in social analytics. In the past, most social tools only measured activity (which produces useless results), but now we are measuring people. Facebook Insights, for instance, doesn't measure views, it measures people who viewed. It's a small but very important difference. For instance, if your post had 800 views, but you only sold 2 products, you might think you only had a 0.25% conversion rate. But if you are told that those 800 views were actually only 40 people, your conversion rate is suddenly 5%.

It's the same data, but one focuses on activity (views) while the other focuses on people. One is meaningless and widely misleading. The other is relevant and valuable.

Of course, Facebook is far from perfect. After many people complained about how they use EdgeRank to force people to buy promoted posts (myself included), Facebook decided to remove the stats that indicated just what percentage of your fans is seeing your posts. Not a good move by Facebook. One should never remove relevant analytics just because it's 'inconvenient'.

But the Trend of Demonstrable Causation is very exciting, but it's also troubling for many businesses. Once you start focusing on the real cause of data, you will also find answers that you don't want to hear.

For instance, we see this with the battle between brands and publishers. Publishers are still trying to hide the real data, because they know that most of the advertising money is never going to lead to a positive return of investment. Publishers prefer to only tell brands about the 'activity' and not provide the useful data that brands need.

For instance, exposure is an useless number unless you know how many people you reach. If you buy one million views, it is really important to know if that is 8,000 people or 500,000 people. For high-traffic tabloid newspapers one million views could very likely be much closer to 8,000 - but they don't tell you that.

The Trend of Demonstrable Causation is destroying this game of half-truths and deliberately leaving out important facts. And it is putting an intense focus on finding the real numbers. It also changes how we do things. Many businesses today (and methods within businesses) are based on the concept that most will lose for the benefit of the few who win.

This is largely how advertising works. Every advertising platform assumes that most of the money that people spent is lost, and that's why prices are so low. For instance, on Facebook the CPM is just $0.23 - or just 0.00023 cents per view. It's so low because we all know that most of the exposure is meaningless.

But what if you could only show ads to people who wanted to buy your specific product, at the exact moment when they where ready to make that decision? What if you could apply the Trend of Demonstrable Causation to advertising? Well, that's where we are heading.

But this won't be good news for Facebook, because it's much more profitable for them to sell many ads at a low price of which most will fail to produce a result, than to only sell a few ads at a higher price.

Every spammer will tell you this. As a platform, you always want to aim for the volume - always! It's much better to trick brands into spending a lot of money for which they get little return, than only getting them to spend money when it counts.

But the Trend of Demonstrable Causation is changing all this, because once we get demonstrable data, brands simply will not waste money on things that don't count. This will wreck havoc with how the online world works today.

We are going to see a huge shift in how online companies can be monetized in the future. Spammers, of course, are the extremes, but every online magazine, newspaper, and most startups are all based on the business of getting brands to buy advertising that result in a low ROI.

Within the next 5-10 years, because of the Trend of Demonstrable Causation, the online world will have to change their business models. Brands can now track exactly how much sale their advertising efforts generate. And they will learn that spending $10,000 for one million views while only selling 80 products, is not a good return of investment.

You have to adjust to a new world where brands demand to get real data (people and not activity), and where each activity is closely linked and tracked to a transaction. You have to be able to demonstrate that you are the cause of a sale. That's the Trend of Demonstrable Causation.

The new world of analytics

The other trend is a direct result of the Trend of Demonstrable Causation. In order to get real data, you need the right foundation - and traditional analytics are ill-equipped to provide it.

The analytics we have today is all based on activity first and value second. To make it worse, it is also based on a single person, using a single device, traversing a single destination in a linear fashion.

This might have been true 15 years ago, but today's user behaviors are very different.

The result is that the very foundation that we base our analytics on are increasingly false, which cause us to make wrong conclusions.

Let me give you a couple of examples. Let's start of with the biggest of them all - unique visitors. I have been writing about this before, so let me simplify it.

Unique visitors are based on cookies. This is how analytic services like Google Analytics can, for instance, track that three visits happening over 2 days are actually caused by the same person. But today there are an increasing number of obstacles putin our way that prevents cookies from being set.

1. People now use more than one device, and since cookies are device specific, you will show up as a new person for each device. One of many problems this causes is a wildly misleading mobile traffic report, which assume mobile visitors are not the same as desktop visitors:

...when in fact many of them are:

And we all know where the multi-device trend is heading.

2. More and more people use ad-blockers, and they are now also blocking analytics. This causes one person to show up as a new person on every visit.

3. The European Cookie law is now giving more people the tools (and rights) to block cookies, again causing wildly misleading data.

4. The use of private browsing causes cookies to be erased on every visit.

5. On iOS, all visits from within apps, like when people click on a link from their Facebook app, is done as a private browsing session - and thus show up as new visits. Meaning that if the same person click on a link in the Twitter app, in the Facebook app, and via Flipboard, he will show up as 3 different people in your stats.

As you can see, the number of ways cookies can become invalid as a data source is growing. We are increasingly basing our conclusions on a misleading data foundation.

So how bad is it? Well, there is no way to tell, because you don't know how many visits are trackable and how many that are not. But consider this:

If we assume the following:

  • Real people per month: one million
  • Number of visits per person: Ten
  • Percentage of uniquely trackable audience: 40%

The result is then that your analytics will show that you have a total of 6.4 million unique visitors ...far more than your real audience of just one million people.

That is both scary, and really fascinating. It's scary because if this is the case, every single conclusion you have made so far, is wrong. But also, the real impact of your digital strategy is only 15% of what you think it is.

It's fascinating, because it means your conversion rates are far better than what you previously assumed. If you have a measured conversion rate of 1.4% of 6.4 million uniques, you actually have a conversion rate of 9% from your real audience.

The problem, of course, is we that don't know that. We have what Avinash so brilliantly call an unknown-unknown. We don't know the size of our real audience, because we don't know the percentage of our untrackable unique audience ...and we don't know how many visits each person makes.

We cannot even use the data that we already have because it is tainted by an unknown factor of inaccuracies.

For instance, your stats might tell you that you have 22 million page views. How many pages do people then view on average? The answer, according to your analytics, is 3.4 pages per unique (22 / 6.4 million), when, in reality, it is 22 pages per real person (22 / one million).

And here comes the scary part. What if, because of the European Cookie law, that 4% more people blocks cookies in their browsers? Now you will suddenly have 6.8 million visitors. Your real audience is still just one million people, but you had a 9% increase in traffic just because of a small fluctuation in cookies.

This wrecks havoc with the Trend of Demonstrable Causation. You don't know if that increase was caused by something you did, or simply because of inaccurate fluctuations in the base data.

It's the same with all our new tools, like the multichannel conversion funnel. It's a brilliant tool, but if you cannot identify your unique audience, it's going to completely skew the result.

In 1995, this phenomenon was insignificant because the only people who would use more than one computer, or more than one browser, where the geeks - and they represented an infinitesimal amount - not worth bordering about.

But today, everyone is a geek. On this site, for instance, 40% of my traffic is mobil, and an unknown number of that is via apps (like Facebook, Twitter or Flipboard). That means that, at least, 40% of my traffic is untrackable - at least!

If I then assume that each person, on average, visits this site 3 times per month, the result is that my analytics is off by 55%!!

The very basis of which our analytics is based has become fundamentally flawed. The mechanics of which it is based, the cookie, is no longer a reliable tool. If we want to embrace the Trend of Demonstrable Causation, we also have to change the very foundation of our analytics.

But the problem with unique visitors, while very important, is only a small part of the real issue. I want to point you to two other examples: Time and conversion paths.

Time

Every analytics report is based on a timeframe first, and whatever we want to measure second. This is a fundamentally flawed thing to do because time is rarely a deciding factor.

Take something simple, like a blog. How do you determine which articles are popular and which are not? iIf you are like most people, you will look at the top 10 articles for a specific month.

The problem, of course, is that you don't see the full picture. When you look at something per month, some articles will have been posted longer than others. It's not relevant to compare individual pieces of content this way.

Everyone who works with analytics know this, so to 'fix it' we expand the scope. Instead of just showing one month, we look at it for the past 6 months or more. But this doesn't solve anything either. Articles posted 2 months ago are likely to have more volume than those posted two weeks ago.

Confining your analytics to a timeframe is widely misleading. But this is the very basis of every analytics tool out there.

The solution is very simple. You need to eliminate the time constraint, and measure each article in relation to each other. Now you can immediately identify which articles performed better than another (And the same for products in web shops).

You can, for instance, see which products performed better within the first few days, compared to how they each performed within the first 50 days - or from day 20 to 30.

Let me give you one example of this. Here is the unique visitor graph for the past 25 Plus reports on this site. As you can see, once you eliminate the timeframe, it becomes very easy to identify which articles are more popular than others.

But we also learn something else. Look at the correlation between the traffic figures. There is no correlation between what's happening the first day, and what's happening the days afterwards. The most popular article of all had a weak day one. But there is a slight correlation between what's happening within the subsequent days compared to the next two months.

Isn't this fascinating? What we are seeing here is social media at work ...and more to the point, that social is not an event. It's something that builds up momentum over time. The activity on day one represent the effect of your own social activity (like when I tweet and post about a new article), and while that does create a lot of exposure, it has little influence on the social activity that comes next.

But I only know this is because I removed time as a data constraint. This tiny change, imposed by our analytics tools, can make a huge difference in analyzing what really works, and why something reacts the way it does.

And it's not just a problem with weeks or months. A small time scale is just as problematic.

Here is a graph illustrating how each one of my articles performed within the first few days of them being published. As you can see, it's a complete mess. Some articles perform much better on day two than on day one.

For a long time I wondered why I saw this pattern. I thought, maybe it was linked to the many studies that tell you that posting at a certain hour is better for retweets than during other times of the day.

But then I started working the numbers, and I realized that it was because of how our analytics tools count days. It's always from midnight to midnight. Meaning day *one*, when posting something at 9PM, is only 3 hours long, while day *one* when posting something at 9AM is 15 hours long. Or in other words. Day one is never 24 hours long if you look at your standard analytics. This is a problem.

So, this is what happens if we, again, remove the time element and make every day 24 hours long (the same length) from the second something was posted:

As you can see, it is a drastically different graph. Now all the lines (except the blue one which was crazy popular) indicate a much stronger day one than day two. What I learned were two things. First of all, constricting your analytics to a time period is directly misleading, and secondly (and perhaps more important), it makes very little difference when something is posted - at least for me.

What other things can we learn when we remove the time element? Let me show you three very fascinating things I found here in this site.

The first thing is that when you remove the time element and compare each article (or product) as individual objects, you can identify patterns in the data. Patterns you wouldn't otherwise not be able to see.

For instance, on this site we see a strong initial exposure, but it only last about one day. This is the effect of what I do myself when I, as a brand, share what I have made. This initial boost is followed by a four day period of high level of sharing - the secondary exposure period. This is the result of people reading the content and reacting (positively) to it by sharing. After day seven, each article then lives the rest of it's live in the long tail.

Notice how all articles die out on day seven. It doesn't matter how popular it was in the initial phase. After day seven, the momentum disappears - every time.

I don't know why this is. I'm still trying to figure this one out. But it's a fascinating pattern to watch. This is the kind of thing you learn when you focus on the Trend of Demonstrable Causation and what effect it has.

The next thing we can look at is the power of the long tail itself. Here is another graph that illustrates just how this works.

This is the effect of the initial hyper-burst of exposure, compared to the traffic levels in the long tail. What we find is that the first week of hyper-traffic is the same amount as the first month in the long tail. Meaning that after only 38 days, the long tail have generated *more* traffic than my initial hyper boost of exposure.

The long tail might look dead, but it is really important. After 140 days, the long tail have generated twice the amount of traffic that I got initially.

So let's zoom into the long tail itself and see what makes it tick. This graph shows what happens from day 60 to day 90:

What we find are these tiny one to two day bubbles of social activity. It starts because some person found the article, and then shares it. This in turn creates an additional level of exposure. All of these bumps are the effect of social sharing, and this is what keeps the long tail going. It's the organic social effect.

If you prevent sharing, as some publishers do with apps, the long tail stops working. The traffic effect of the long tail is because of sharing.

You might also notice, that while all of these are some form of sharing, none of the bumps go viral. I get an initial boost of traffic that ends after a day or two - hence no viral effect.

Want to see what a viral effect looks like? Here it is:

Follow the purple line and you see that the initial sharing happened on day 33, and from that point it just grew and grew. This is real viral effect, when something grows exponentially from what came before it.

Also notice the ripple effect. Day 36 and 38 had almost no traffic at all. I often see this effect with social media. I don't know what's causing it yet, but something is making the social effect fluctuate. We see the same pattern between the initial exposure and the secondary exposure (in the graph I showed you earlier).

There is another element that is very important to learn, and it relates to real time analytics. Real-time analytics is the new buzz, and everyone is having fun with it. But look at the purple line.

The *cause* of the vital effect happened on day 33, but you are unlikely to notice it in your real-time monitor until day 39 (or possibly day 37). What that means is that if you wait until you see the effect in your real-time monitor, you will be four days out of date - and it would be far too late to do anything about it.

Think about that for a moment. The type of real-time analytics we have today is largely ineffectual, because we need tools that can predict possible real-time effects four days before they happen.

Again, this is the Trend of Demonstrable Causation. Real-time Tools like Chartbeat, for instance, are great fun, but they completely miss the point. They show the end-result of something that has already happened, instead of helping us to find the *cause* of why it happened in the first place.

If we want to sell more products or subscriptions, we need to be able to identify and influence the cause of a social effect - not the end-result when it is far too late to do anything about it.

(Actually, we also need to understand why it happened at all, to learn if we can somehow make it happen again on purpose.)

Rethinking conversions

We have looked at the problem with identifying real people, and the drastic effect it has on everything else. And we have looked at how we need to stop limiting our data to a time-frame. Let's end this article with another element that is changing how we interact online - and are thus also changing how we can look at analytics. It's about how we measure conversions.

It's amazing what we can do today in relation to tracking conversion. The tools we have in Google Analytics, for instance, is blowing me away.

The first step was goal tracking. We would identify a desired path we wanted people to take and assign goals to each step. The result is amazing reports like this one.

We can see how many people who look at one of your products, how many go to the checkout page and how many who subsequently place an order. It's just brilliant.

But, we quickly realized that this linear path, from A to B, was rarely how people interacted. People interact organically, and they often need to be influenced by several separate interactions before you get a sale.

Cue multichannel conversions, another great tool that allows us to follow a single person across several independent paths. It's great!

But there is a third type of conversion tracking that is even more important. It's people ...or specifically, conversions between people.

Let me give you a simple example. Imagine that you see these two interactions on your site:

One person (#1) visits the site and leaves. You think of this as a failed conversion. You have a 100% bounce rate, no goals achieved, no transaction and thus no conversion. And you think of it as a complete failure, wondering what you did wrong.

The other person (#2) is equally puzzling. This person came to your site and immediately bought a very specific product without any prior interaction, or any additional discoveries along the conversion path. It's great that he bought the product, but you cannot identify what caused it.

There can be many reasons for why it happened this way, but one answer lies in a multi-person conversion tracking. Something that no analytics tool (as far as I know) support today.

Here is how that works:

The first person (#1), the one you assume to be a failure, was actually the *cause* of the second person (#2) coming to your site and buying your product. The first person didn't need the product himself, but he remembered that one of his friends were looking for just what you are selling.

If you hadn't influenced Person One, you would never have sold anything to Person Two. It's not about a multichannel conversion. It's about a multi-person conversion.

And you might say, "how big can this really be?" Well, this is how almost all social media works. It's not multichannel. It's multi-person, and it's seriously important to know just how much of your sale is the result of another person.

On this site, where I'm tracking this kind of thing, almost 50% of all new subscribers that signed up last month was the result of the illustration above - 50%!! That's huge.

This is why I'm so focused on social sharing. Everything I do is based on this concept. Baekdal Plus is designed around it as well. I know that I need to influence Person One to get Person Two on board. It doesn't matter if Person One buys anything or not. That's not important. What matters is how influenced Person One feels.

Again, this is the Trend of Demonstrable Causation. We are demonstrating the Person One (who looks like a failure) is actually the influencing cause of the real sale.

But there is a problem in the way we measure analytics today. It's based on interactions of a single individual. It assumes that people do not connect and influence each other directly. We need to change that. We need to stop looking at analytics as a single person, using a single device, traversing a single destination in a linear fashion. That's not how the world works anymore.

The connected world is based on multiple people, using multiple devices, traversing multiple destinations in an organic fashion. It's the complete opposite of how we do analytics today.

The Future of Analytics

The Trend of Demonstrable Causation is completely changing everything we used to know. It is invalidating the very foundation of which we today measure interactions. You cannot fix this by adding features, and I believe that we have reached the very limit of what we can do with traditional analytics.

We need a new approach, one that isn't defined by the traditional way of measuring.

The future of analytics is changing dramatically. And the shift is just as profound as the one that the media industry experienced when they had to move from print to digital. It's not about the format, or the tools. What's changing is the very foundation of how things are done.

Print magazines cannot be successful just by creating a similar digital magazine on the iPad, because it doesn't respond to the connected world. It's the same with analytics. Traditional analytics cannot embrace the connected world by just moving it into a new environment and adding a few features. It doesn't work that way.

We have to rethink everything, and it starts with how we measure things to begin with - the raw data itself.

For instance, social sharing is done using mass-market campaign variables (one variable for all). That's wrong. You need to set a different (anonymous) campaign variable for each person, in order to track interactions across people.

You need to change the very structure of your site or web shop, to get the right data to begin with. Data that matches the multi-funnel nature of the connected world.

It's the very basis of our analytics that is changing.

In this article I have pointed you towards the three main examples of why we need this change: Unique visitors, time, and conversions. But these are only a very small part of the hundreds of other elements that are changing as well.

The concept is still the same: "What's the cause of a transaction?" But the Trend of Demonstrable Causation is forcing us to approach the question from an entirely different direction than what we are used to.

The Trend of Demonstrable Causation is just starting. It has slowly been building up momentum, but it's only within the past 2 years that it has really started to show. It's going to take another two to three years for it to really have an impact.

But by then, we are going to see an explosion in new analytical ideas and startups - each one challenging the traditional tools.

And remember, this is not just a change in the analytical tools that we use today. The Trend of Demonstrable Causation changes everything. It will change the advertising industry, how startups can be formed, their business models - everything!

And this trend is linked to all the other big trends: The social trends, the connected trends, and the digital trends. They are all moving in the same direction, towards an organic multiverse of signals that are 'demonstrable' to the individual.

In 2004, Facebook was formed based on the idea the social activity was a relevant factor. That's why so many people were chasing how many likes they could get. Then it moved on to engagement, another meaningless measurement of activity. Today, it's about meaning.

We see this effect on Google+. Many people think it looks like a ghost town, but that's not it. It was born in a different world, and the Trend of Demonstrable Causation is forcing the interaction to have a higher level of meaning. That's why many posts on Google+ is (slightly) more profound that on any other social network - and that every comments is slightly more in-depth.

It's not that you cannot do the same on Facebook or vice versa. It's that Google+ is closer to the effect of the Trend of Demonstrable Causation.

Shared By Plus Subscriber

Olivier Lendresse

READ ALL THE PLUS REPORTS

This is Baekdal Plus content. It is shared with you for free by a member. Please reshare it.

Share on

Thomas Baekdal

Thomas Baekdal

Founder of Baekdal, author, writer, strategic consultant, and new media advocate.

Follow    

Check out my new book: THE SHIFT - from print to digital and beyond? Free for Baekdal Plus subscribers, $8.79 on Amazon.

There is always more...