15% less visitors, and I’m happy
A few weeks ago, I was looking at my site statistics, and discovered I was suddenly vastly more popular than I had ever been. From my previous position of around 400-500 hits per day, I was getting somewhere in the region of 2000. Wow. Well, it’s nice to be popular. However, when I looked at my site statistics, it became apparent that about 55% of the traffic I was getting was from webcrawlers.
Now I’ve got nothing against Google, Yahoo, MSN etc crawling my site, because frankly I want to get nice high listings in their search engines, but I was suspicious that of the web crawlers, about 80% of them were unknown. Now, it could just be that my free stats package is failing to recognise some perfectly legitimate bots. That seemed perfectly plausible but I reckoned it was also entirely possible that some unscrupulous web crawler chappies had come across my site from somewhere and were frantically crawling all over it, presumably either to try and hijack my email form, give me lots of comments telling me where I could buy tramadol (I don’t want any, okay? I don’t even know what it is, to be honest), or be scouring my site for email addresses and or private things.
Now Google, Yahoo, MSN and most legitimate bots will honour the robots.txt exclusion protocol, so one way you can identify a bad bot is simply if it doesn’t honour this. In fact, the only legitimate organisation I’m aware of that has a bot which doesn’t honour the exclusion protocol is the Sitemorse testing bot, for which the theory is apparently that it is testing all content for accessibility, and it won’t allow you to exclude some sections (although not everyone is happy its behaviour, and indeed it prompted Dan Champion to write an article on blocking sitemorse and other unwanted robots). Some sites have even set up bot-traps to generate random links into excluded areas, which in turn generate more random links, so the good bots following the exclusion protocol don’t go in, and the bad bots go round in circles. But that’s still using up your bandwidth.
What I wanted was a method which would just cut out a look of this crap. And then I found the Bad Behaviour plug in for WordPress, which does just the job. It analyses the requests to see if they are likely to be a bad bot (are they breaking the exclusion protocol, are they claiming to be google when they aren’t etc) and returns just a simple screen telling them why their access has been denied — and what to do if they are legitimate. In the first week, this blocked around 90 bad bots visiting my site. Every week since, it’s blocked about 60, as it seems that many bad bots aren’t returning once they’ve been kicked out the first time.
Of course, it’s possible that I’m blocking some legitimate users too, since someone could have changed their user agent string to make it look like they are googlebot in order to get more content from a site (some sites serve additional content only to registered members, but also serve it to googlebot in order to get higher rankings; cloaking as googlebot therefore is naughty but would get you access to all the content). However, in these cases the error message "you are claiming to be a major search engine but do not appear to be" is sufficiently descriptive to let people know what the problem is.
On the whole, I’ve been getting about 15% less page hits since I’ve installed Bad Behaviour, and the only category showing a reduction is the unidentified bots — the actual browser based traffic seems to have increased slightly. So there you go, install bad behaviour, get rid of the crap and don’t put off your regular visitors.
test says:
September 20th, 2011 at 12:45 am
T4st Blog…
[...]This is a new website on this topic. You can see more about it on the site[...]…
Tim says:
December 11th, 2012 at 12:52 am
No you pay them and you will either get notihng for your money or get something that doesn’t even work. And to anyone who would be thinking of actually doing it I’m telling you now this is a very very bad idea. The best online poker sites and casinos spend good money putting measures in place to stop people from cheating, and that’s exactly what this kind of software is, a means for people to cheat, and the software will not be able to be used with these sites, you will certainly be caught and they actually can prosecute you for it as well in case you were not aware of that, they don’t take kindly to people who act in this way. The best thing for people to do would be to learn how to play poker properly in the first place, if you are good enough then you don’t need to try and cheat. On the other hand though I hope that anyone who goes trying to do this kind of thing to win at poker does spend their money on it so they get ripped off because they deserve to be.
ohrrwn says:
December 11th, 2012 at 12:30 pm
ci4qcn ksixpltzzers