December 3, 2009

Google Thinks I'm Naughty

I've been doing quite a lot of web stuff lately, and this has involved doing link checking on my site. Since it now has AdWords in it, this means my (totally unsophisticated) web crawler has been following the advertisers' links. Oops.

This afternoon I get an email from WellKnownSearchEngine saying:
It has come to our attention that invalid clicks have been generated on your Google ads, posing a financial risk to our AdWords advertisers. Please note that any activity that may artificially inflate an advertiser's costs or a publisher's earnings is strictly prohibited by our program policies.
You can imagine the rest. Dire Warnings, Serious Threats, and so on, coupled with a refusal to offer specifics. So I just thought I'd ask: what tools do you use use for link-checking, and how do you avoid this issue?


Peter said...

Make your webcrawler more sophisticated?

If the urls are redirects from your domain, blacklist them based on your robots.txt (I believe mechanize does this automatically) or try sending a HEAD request instead of a GET to cross domain URLS.

You could also try adding rel="nofollow" to your ad links if you really have no control over the spider.

Anonymous said...

As an Adwords advertiser, I can tell you this is extremely common. On top of this, ad content matching is quite poor (they match on _any_ word in the ad!) and the tools provided by Google to manage placements are extraordinarily primitive. For example, I have to manually copy/paste urls to view where my ads have been shown, and there's a 36+ hour delay in even getting stats.

As a result it's very hard to both reach as many relevant sites as possible _and_ avoid the many click scams that are out there, many of which rely on hiring real live people to actually click in a seemingly "real" way.

The bottom line is despite daily monitoring, excluding sites, altering ads, adding negative keywords, etc, we still pay about 50% of our ad revenues to sites with unrelated content or that are clearly fake.

The only real available alternative is to limit ads to only specific sites, but that does end up missing quite a large number of legitimate places to advertise.

The only real solution to the problem would be change ad content matching to pay attention only to keywords and negative keywords in placing ads and not matching on words in the ad. In other words, show the ads on sites that would rank somewhere reasonably high on the Google searches where the ads are shown.

Unfortunately, clearly, there aren't many incentives for Google to improve this situation and thus cut off a significant chunk of their revenue. And there is no real alternative for online advertising. It is essentially a monopoly at this point in time.

Kenneth said...

This happened to me too.

Banned for life.

Doug said...

Google ads are written in using Javascript, are they not? A standard webcrawler is unlikely to execute the javascript, so how did it get the ads?

Doug Napoleone said...

I use the little advertised 'Google Webmaster Tools' for link checking and 'search optimization' (I hate that term).

Web Master Tools

This tool was developed and released well before Analytics, and is mostly forgotten. It has full link verification, and tools for diagnosing problems with robots.txt and sitemaps.

hmmm.. looking at the pycon 2010 site I see that we have 6 bad links on 14 pages... Some of which are coming from the outside (posts to news groups) where there is a typo. Should add redirects to fix those.

Need to remember to use this tool...

Steve said...

Thanks, Doug. What would we do without you?