This issue is FINALLY getting the attention it deserves:
I had a closer look at many of the blogs concerned that had spammy content — pages promoting credit cards, pharmaceuticals and the like, and I realized that if you go to the root domain they are all legitimate blogs. Not scraper blogs that were being auto-generated with adsense / affiliate links, which was extremely curious, and actually reminiscient of something that hit home a few months ago.
A few months ago, this blog got hacked — but in a sneaky way. Not only did the hackers insert “invisible” code into my template, so that I was getting listed in Google for all manner of sneaky (and NSFW terms), so that people could click on those links with the hacker getting the affiliate cash — but *actually*, said hackers also inserted fake tempates into my wordpress theme.
Oddly enough Tailrank picks up on this spam because of our clustering algorithm. We cluster common links and terms via our blog index and promote these stories to our front page.
Since we ‘trust’ stories with past behavior when major A-list blogs like ZDNet get owned we believe they are legitimate links.
If we had a smaller index this might be a big easier to handle but we’re indexing 12M blogs within Tailrank and on Spinn3r.
Another way around this of course would be to blacklist every blog running WordPress 2.2 or earlier but we’re talking millions of blogs here and we don’t want to unfairly harm anyone.
To date our approach has been to wait until Tailrank has identified the spam, and then blacklist any blogs that have been compromised.
Unfortunately this is a war of attrition with the spammer just spending a few more days and hacking another dozen or so sites.
The only positive aspect of this is that it’s encouraging people to upgrade to WordPress 2.5.
We’re also working on some secondary algorithms to catch this a bit sooner and we’ll probably ship these in Spinn3r 2.5 which is due shortly.