4 Ways to Combat the Rise of Referral Spam - Challenge. Test. Grow. Small moves every day. | FRWD in Minneapolis MN

4 Ways to Combat the Rise of Referral Spam

Referral Spam On The Rise. What The Heck is going on?

In the last year, you may have noticed some weird sites bubbling up to the top of your referral’s report in Google Analytics. Don’t be alarmed! You’re probably looking at referral spam.

What is referral spam? It’s falsified traffic that declares the referral source as something spammy like “best-seo-services.info” or “semalt.semalt.com”. The infamous Semalt.com has always been a huge headache. The Ukrainian SEO company uses shady crawler tactics to drive their name to the top of your reports.

I generally dismiss spammy traffic as “those crazy Russian Botnets just doing their thing.” However, in recent months the industry has seen a sharp rise in referral spam —  enough to warrant addressing it as a problem. This rise in weird referral sites is the result of spammers targeting your website (or your Google Analytics Property directly) and pumping spoofed “ghost data” into your analytics account.

Taking a look at various sites (image below), we’ve seen a notable increase in spam referrals in the first half of 2015.

Google Analytics Referral Spam

Traffic Segment of identified Referral Spam from various sized websites.

 

What’s going on?

Why are spammers targeting Google Analytics data? Various, no doubt sleazy, reasons. One assumption is they’re targeting unassuming Google Analytics users in hopes that they’ll visit the spam site in question. “What’s this Semalt site? Hmm, lemme check it out.” Referral spam is essentially a cheap & dirty way for shady companies to drive traffic to their affiliate network of gray sites. Many of these sites could be part of a malware or phishing scheme. Often, they use look-a-like or otherwise authoritative looking domains like “hulfingtonpost.com” — so be cautious what you click (See what they did there?).

While referral spam has been around for some time, we’ve seen new – more nefarious – techniques at play. Some spammers are now skipping your site completely and instead, targeting Google Analytics directly. By transferring HTTP requests directly into your analytics property, they’re stepping up their game and piping in data via new methods.

(Note to webmasters: these new spam methods mean .htaccess blocking is less effective! That’s a bummer.)

These new methods mean that spammers can now easily spoof “ghost interactions” into your account; things like fake /page-level-content/ views, fake campaign names, spoofed hostnames — even fake events. Yuck!

This problem has had a growing impact felt across all parts of the industry:

  • Sours Reporting: “Looks like referral traffic is way up this month! Oh wait, nevermind — our analyst is blaming it on robots again.”
  • Confuses Clients: “Why is this /free-cam-girls/ page showing up in the top pages report? What the hell are our SEO guys doing?”
  • Wastes Time: Yeah, I’d love to spend the next two days drudging through 500 spam sites to make some unwieldy regular expression.”

What can you do to combat it?

There are plenty of great step-by-step guides out there that document the process in detail. Essentially you’ll need to create a series of filters which exclude known spam referrers. Megalytic’s guide does a good job of documenting the filter process & implementation nuances in depth. If you already know what you’re doing, I’ll share some of my regex’s and list of known spam sites. Keep in mind there is a 255 character limit so you’ll likely have to create multiple filters. Pro Tip: Don’t apply these to your raw profile view.

How To Exclude Referral Spam

How To Create A Spam Filter

 

1) REGEX The Worst Offenders

This specially concocted regex (Last updated: May 19th 2015) covers most of the referral spam sites we and the community have caught so far. I recommend you add this as a filter as well as a segment.

seo.*(off|exp|solu|ser)|button|semalt|porn|darodar|blackhat|lovevitaly|free.*traffic|savetube|cheap.*online|gratis\.|o-o\.|x\.ru|payday|fbdownload|ftpupload|adcash|torture|forum.*\.(i|r|u|p|g)|medispain|your-website|priceg

2) REGEX The Wannabes & Stragglers

A second spam filter isn’t a bad idea, especially when facing the 255 character limit. Here are some more common offenders.

theguardlan|hulfingtonpost|humanorightswatch|salternativementalhealth|errors-scan|billiard-classic|bestwebsitesawards|mirobuvi|googlsucks|anticrawler|4webmasters|dipstar|inboxdollars|lombia|lumb\.co$|lomb\.co$|econom.co$|sq01

3) The Nuclear Option 

.*\.ru$|.*\.pl$|.*\.ua$|.*\.ro$|.*\.ga$|.*\.info$

Caution. This removes ALL traffic from these TLDs. Be sure you know what you’re doing before you go all ‘Wall of China’ mode.

4) Accept only known Hostnames

fake hostnames in google analytics

Fake Hostnames, the calling card of ghost data. Hostname should generally only be [yoursite].

Another good filtering method is to only accept known host-names, and block everyone else. Check out Megalytic’s guide for the walk-through on how to tackle this in detail. Ensure you know what you’re doing so you don’t accidentally block out subdomains, etc. This approach seems to kill a lot of the advanced tactics spammers use when sending ghost interactions into your site, and is highly recommended.

 

This Isn’t A Long Term Solution: The New War Against Spammers

Unfortunately the battle is just beginning. None of these solutions are long-term solves. Spammers will almost certainly evolve their tactics. The best we can do is plug the holes with some clever hacks and adapt our filtering approach in-step with spammers. Hopefully Google becomes more proactive with filtering out spam traffic, but for now, they’ve been mostly quiet. I have seen little to no evidence that the “filter known bots” function does anything yet. Until then, stay vigilant, and be proactive in protecting your data integrity from those dirty spammers.

PS: Here’s a list of the Common Spam sites we’ve compiled.