Does Yandex honor robots.txt?

Posted on

I have setup arobots.txtwith “User-agent: *” and appropriate Disallow, but I discovered in my logthat the Apache2 server was under heavy load from the bots of Russiansearch engine Yandex. Is it me who have setup the robots.txt wrongly? Asfar as I can see no other bots get to the place I do not want to becrawled.

People on the internet suggest “User-agent: Yandex” and disallow rightafter, but others claim that Yandex does not look at robots.txt andsuggest putting the following in the .htaccess file in the document root(usually /var/www/):

SetEnvIfNoCase User-Agent "^Yandex*" bad_bot Order Deny,Allow Deny from env=bad_bot

This seems to work for me, although I also needed something like“AllowOverrideall”in the configuration file usually found in thedirectory /etc/apache2/sites-available/

So this is one of the silly things you can spend your life on.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s