I have setup arobots.txtwith “User-agent: *” and appropriate Disallow, but I discovered in my logthat the Apache2 server was under heavy load from the bots of Russiansearch engine Yandex. Is it me who have setup the robots.txt wrongly? Asfar as I can see no other bots get to the place I do not want to becrawled.
People on the internet suggest “User-agent: Yandex” and disallow rightafter, but others claim that Yandex does not look at robots.txt andsuggest putting the following in the .htaccess file in the document root(usually /var/www/):
SetEnvIfNoCase User-Agent "^Yandex*" bad_bot Order Deny,Allow Deny from env=bad_bot
This seems to work for me, although I also needed something like“AllowOverrideall”in the configuration file usually found in thedirectory /etc/apache2/sites-available/
So this is one of the silly things you can spend your life on.