Denial of Service crawl on the Brede Wiki?

Posted on

Just as I was about to download a meta-analytic comma-separated values file from the Brede Wiki my server with the wiki got in deep trouble. Though there was some respons it was really slow. I had to do a hard reset. When I looked in the log files I could see something like “trx0undo.c … Mutex at … created file trx0rseg.c” and “InnoDB: Warning: a long semaphore wait”. I had a similar problem yesterday.

I was afraid that this might be a harddisk issue, but the harddisk utility command “smartctl -a /dev/hda1” said nothing.

If one googles with the error message a few bugs and questions shows up, but apparently not something that could help me.

Then I looked in the Apache log (/var/log/apache2/access.log) I could see aggressive download from a specific foreign university computer with several request made per second at around the time when the server got into trouble. So it might be that MediaWiki/MySQL has a problem there – not being able to handle that amount of requests. I wrote the following email to the university department:

Dear … of Computer Science,

I am recording aggressive downloads on my Web server from 999.999.999.999 which resolves to …, so it must be a computer at your site.

The amount of downloads unfortunately make my server stall, – it is a rather old computer that cannot handle much load. It is probably a bot (perhaps constructed by one of your students) that has been setup to crawl my site. I hope you can contact the person who is responsible for the bot and ask him to moderate the download rate. At the moment I am getting several request per second from the 999.999.999.999 computer.

The person behind the bot has set the agent field wrong. At the moment it display “firefox 3.0” which I very much doubt.

If it is not possible for you to contact the person I might have to setup a firewall item disabling the University of … to access my Web server.

Sincerely
Finn

I now also added “Crawl-delay: 3″ to the robots.txt file. I do not know how well different crawlers implement that directive.

If it is the case that the request rate has caused the problem I am a bit puzzled that MediaWiki/MySQL cannot handle that rate. It is a fairly old computer, but it should fail gracefully. Maybe I need to go over the configuration. I suppose the issue might be around “$wgDisableCounters” that I believe must require a write during the reading process. It is nice to have the download statistics but not essential.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s