8 Steps for Eliminating Bad Data in Google Analytics

December is a weird month for analytics reporting. This is the busiest time of year for many e-commerce companies. Employees are working around the clock to ensure that their websites are performing well.

Most other companies use this time as an opportunity to take time away from the computer. They recharge their batteries. They use up their vacation days. Many employees take a significant amount of time off from their jobs the holidays.

While you were home alone, the spambots messed with your data

This year, many of us will be returning to our analytics reporting and notice something strange in our reports. There are a lot of referrals from a website we have never seen before!

It might look something like this:

forum.topic.darodar.com

The topic number could be anything from 1 to 100 million. All the referrals are from a domain called darodar.com.

For smaller websites, this may show up as a top 10 traffic source for December. For tiny websites, this may be your largest source of traffic!

Here is an example where the forums are cracking into the top 10 of a website.

Forum Darodar.com

As you can see from my handy annotation of this traffic, these 127 visits are crap. They are not from real people. Your website did not go viral. Your website was not mentioned in some kind of forum.

You were visited by the Santa Claus of traffic: someone who you may have believed in when you were a young analyst. Then you realized that it was just your parents pulling one over on you.

Or in this case, you realized that it was just an international spy-bot.

How can you tell that the traffic is crap?

Traffic from Russia

It’s pretty easy to tell when the traffic you are receiving is crap. The longer you work in web analytics, the more experience you have dealing with these problems.

I can tell when traffic is junk in about .0002 seconds using a mental checklist I have developed over the years.

These are the rules that I teach to the hundreds of students who have gone through my Google Analytics Mastery course.

Here is the checklist!

1) If it sounds too good to be true, it is

Traffic rarely falls from the sky. Yes, there are moments where something you write goes viral. Or gets picked up by an influencer and spreads.

But these are rare moments for 99.9999% of web pages.

When you look at analytics reports, look to see what has changed since the last period you analyzed. If something major has changed, always assume the data is wrong.

Blind trust in numbers is more dangerous than not looking at the numbers at all.

2) Visit the referral link listed and see what it says

Unfortunately, I did not get a screen shot of the darodar.com referrals when the site was live. But I can assure you that it was not a site that appeared to be driving traffic to any of the sites I own. The language was Russian, and there were no links.

This is always a dead-giveaway that the traffic is not real.

Now, if the website redirects you to what appears to be a legitimate electronics store. Perhaps this was just a traffic generation strategy?

Darodar.com redirect January 2015

If the URL in your referral report redirects elsewhere, there is little chance that it was legitimate.

3) Traffic has a half-life that usually lasts longer than a few days

The next step of analysis is to understand natural traffic patterns to your website. When something goes viral, it almost always follows the same pattern.

There is a large spike in traffic for a day or two. After a few days traffic cuts in half. A few more days and it cuts in half again. Over time the traffic fades down to zero.

Here is an example of a natural traffic pattern from when a post caught fire on Jeffalytics:

Natural Traffic Pattern

While it is hard to tell with the scale, the post received 100+ visits per day for months after the initial viral push. A big influx of traffic will often last for months or years before it finally reaches 0. This comes from the initial referrer that made you viral.

It also includes returning direct visits to your website from those who discovered you through that source.

Here is what the traffic pattern looks like for the darodar.com forums:

Russian Forum Traffic

Just as soon as it was there, it was gone. No half life. It increased in traffic over time and then got cut off. This is not a natural traffic pattern.

4) To confirm that traffic is non-human or bot traffic, look at your referral metrics

The easiest way for me to tell that traffic is junk is a 100% bounce rate.

Or a 0% bounce rate.

Or a 100% new visitor rate.

Or a 0% new visitor rate.

I have analyzed billions of website visits and seen millions of traffic sources. I have never seen a perfect 0% or 100% rate for any metric from human traffic.

It is virtually impossible for this to happen, because of how we collect analytics data. A new visit in Google Analytics means that a visitor cookie was never before set in a browser. If your content goes viral, there is no chance that it will only reach new visitors.

A 100% bounce rate means that every single person who visited your site left without taking any kind of incremental action. I have only seen this happen with Google AdWords traffic. With a tiny budget. Sending traffic to a terrible landing page with no navigation.

For more on understanding (and filtering) bot traffic in Google Analytics check out this post. 

5) Use secondary dimensions to validate your findings

While high bounce rates or new visitor percentages are usually a dead giveaway, you may want more evidence of a problem. This is where secondary dimensions come in handy.

Try applying secondary dimensions to your source/medium report. Does the traffic look natural?

I like to look at the service provider report to see if the ISP looks legitimate.

Service Provider by Referrer

You can also find interesting properties by looking at the city, country and region of the visitor.

City Report

After looking through several secondary dimensions, noticeable patterns will start to emerge.

6) Block out bad traffic as soon as you can

You can choose your own tolerance for when you should filter out bad traffic to your website. Or you can use mine.

Here are two ground rules for when you should filter out your traffic:

  • If a non-human traffic source makes the report for your top 10 traffic sources, remove it as soon as possible
  • If a bot traffic source accounts for more than 1% of your traffic, remove it asap

Would you like to see how our Video Lesson about Getting Clean Data looks like?

Much like filtering your internal IP addresses, you want your data to reflect your marketing audience. You do this by eliminating non-marketing visitors from Google Analytics. My rule of thumb is to apply a filter when these visitors represent more than 1% of your traffic.

Why 1%? Because if this traffic is more than 1% of your traffic, it can have a noticeable effect on your ability to analyze results. Let’s take the example of darodar from above and examine further.

Darodar Qualitative Traffic Metrics

All the key metrics in the behavior report are significantly different for this traffic than the rest of the site. Especially the session duration metric. This difference is enough to affect your ability to accurately report on website activity. You need a filter.

How do you apply a filter for this traffic?

Easy. Create an advanced filter with the following pattern to protect against future visits:

Darodar.com Referral Filter

The filter pattern is:

.*darodar.com

While it appears that the darodar.com traffic went away in December, I still recommend a filter. This pattern will prevent it from ever coming back into your reports.

To learn more about filters of spam traffic, I recommend reading this excellent article by Analytics Edge.

7) Apply an advanced segment for your historical data

Applying a filter helps you proactively block future traffic, but what about the past? Advanced Segments can be your best friend here.

Create an advanced segment that uses a regular expression to block darodar.com traffic. Here is how this looks:

Advanced Segment Block Darodar.com

If you are uncomfortable with advanced segments, here is a link to the segment. Install Jeff’s Block Russian Forums segment. You can also find this in the Google Analytics Gallery.

When applied to your site, you may notice a large difference in key metrics like time on site. This site was over-reporting time on site by 15 seconds because of darodar.com!

Advanced Segment Applied

The only downside of the advanced segment is that it could result in data sampling for large sites. With that said, large sites may not even notice the forum traffic in their reports, so this may not be necessary.

If you noticed more referrals than just those from Darodar.com, we also have you covered. Here is an advanced segment that covers several more odd referrers.

8) Annotate your account to explain the blip in traffic

Being the good data citizen that you are, it’s important to let others know about your discovery. Spend 2 minutes annotating your account with an explanation of what happened. Being funny is not required, but it does tend to make analytics less boring.

Annotations for Russian Spammers

There you have it. This is a simple checklist that you can apply to just about any situation you have in analytics.

To recap, here are the 8 steps that you should follow:

  1. If it sounds too good to be true, it is
  2. Visit the referral link listed and see what it says
  3. Traffic has a half-life that usually lasts longer than a few days
  4. To confirm that traffic is non-human or bot traffic, look at your referral metrics
  5. Use secondary dimensions to validate your findings
  6. Block out bad traffic as soon as you can
  7. Apply an advanced segment for your historical data
  8. Annotate your account to explain the blip in traffic

Leave a Comment

Your email address will not be published. Required fields are marked *

Show Buttons
Hide Buttons
Scroll to Top