A massive reduction in the number of spam comments

I’ve written a number of posts on ways to reduce the number of spam comments a blog receives. In this post, I’ll revisit an old method that has almost completely stopped spam comments at ardamis.com before they get to the database.

My first system for blocking WordPress comment spam was an overly complex combination of JavaScript and a challenge-response to test that the comment was being submitted by a person. The value of the action attribute in the form was not in the HTML when the page was loaded, so the form couldn’t be immediately submitted, then JavaScript was used to write the path to a renamed wp-comments-post.php file only after a certain user action was performed. I was never really satisfied with it. I didn’t like relying on JavaScript, I had doubts that any human being (meaning of any mental or physical capacity, speaking any language, etc.) could correctly answer the question, and I was concerned that any obstacle to submitting a form discourages legitimate commenting.

A few months later, I posted a simpler timestamp method for reducing WordPress comment spam that compares two timestamps and then rejects any form submission that occurrs within 60 seconds of the post page being loaded. The visitor wasn’t bothered by an additional form field solely for anti-spam and there was no JavaScript involved.

Both methods were very effective at blocking spam before it made it to the database. In the five months leading up to the implementation of the first method, Akismet was catching an average of 1418 spam comments per month. In the first five months after these methods were put in place, Akismet was catching only 54 spam comments per month. But I also noticed a reduction in legitimate comments, from an average of 26 per month to 20 per month, which led me to suspect that real visitors attempting to leave comments were being discouraged from doing so.

The timestamp method required changing a core file, which was overwritten each time WordPress was updated. As time went on, I forgot to replace the file after upgrading WordPress, so the protection was lost and I once again had only Akismet blocking spam. A few months later, while doing work on the database in an attempt to speed up WordPress, I happened to check my historical stats and found that Akismet had detected 4,144 comments in July, 2010. Yikes. It was time to revisit these old methods.

At 2:30 AM on August 1, 2010, I again implemented my timestamp method, but this time I also renamed the wp-comments-post.php file that processes the form. I changed my theme’s comments.php file to submit the form to the new page, deleted the wp-comments-post.php file from the server and tested to make sure that comments could still be submitted. And then I waited to see what would happen.

The effect was pretty amazing. The spam had almost completely stopped.

My Akismet stats look like this:

Date Spam
7.30.10 192
7.31.10 196
8.1.10 32
8.2.10 0
8.5.10 4
8.8.10 4
8.10.10 4
8.11.10 4
8.13.10 0
8.14.10 0

(I don’t know why so many dates in August are skipped in the log, but whatever.)

Fast, but only partial protection

The quick and easy way to reduce the number of spam comments that your WordPress blog receives is to merely change the location of the comment form processing script.

  1. Rename wp-comments-post.php to anything else. I like using a string of random hexadecimal characters, like: z1t0zVGuaCZEi.php.
  2. Edit your current theme’s comments.php so that the form is submitted to this new file.
  3. Upload these files to their respective directories, then delete the wp-comments-post.php file from your server.

This method works well to stop spam submitted by bots that assume the comment form processing script used by WordPress is always at the same location. More advanced bots will read the actual location of the file from the action attribute of the form element, but that can be countered by using either the JavaScript or timestamp method.

Access log analysis

To illustrate the effectiveness of the renamed wp-comments-post file + timestamp check, below are some events from my 06 August 2010 access log.

Bot defeated by renamed file alone

Here is a form submission to the non-existent wp-comments-post file that occurs 2 seconds after the post page is requested.

173.242.112.44 - - [06/Aug/2010:23:21:37 -0700] "GET www.ardamis.com/2007/07/12/defeating-contact-form-spam/ HTTP/1.0" 200 32530 "http://www.google.com" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50"
173.242.112.44 - - [06/Aug/2010:23:21:39 -0700] "POST www.ardamis.com/wp-comments-post.php HTTP/1.0" 404 15529 "//ardamis.com/2007/07/12/defeating-contact-form-spam/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50"

The bot is sent a 404 HTTP status code, which is widely understood to mean that the page isn’t there and you can stop asking for it. But that doesn’t stop this bot! Two minutes later, it’s back at another page, trying again.

173.242.112.44 - - [06/Aug/2010:23:23:01 -0700] "GET www.ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/ HTTP/1.0" 200 101259 "http://www.google.com" "Opera/9.64(Windows NT 5.1; U; en) Presto/2.1.1"
173.242.112.44 - - [06/Aug/2010:23:23:05 -0700] "POST www.ardamis.com/wp-comments-post.php HTTP/1.0" 404 15529 "//ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/" "Opera/9.64(Windows NT 5.1; U; en) Presto/2.1.1"

Again, it gets a 404 back. Some bots never learn.

Bot defeated by timestamp check

Here is a form submission to the renamed wp-comments-post file that occurs 4 seconds after the post page is requested.

91.201.66.6 - - [06/Aug/2010:23:30:41 -0700] "GET www.ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/ HTTP/1.1" 200 21787 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)"
91.201.66.6 - - [06/Aug/2010:23:30:45 -0700] "POST www.ardamis.com/wp-comments-post-timestamp-3.0.1.php HTTP/1.1" 500 1227 "//ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)"

The 500 HTTP header indicates that this submission was denied and the comment never made it to the database. This access log doesn’t indicate which check stopped the POST (eg: the email validation or the timestamp function), but my money is on the timestamp.

Here’s another form submission to the renamed wp-comments-post file that occurs one second after the post page is requested. Speed reader or bot?

95.220.185.210 - - [06/Aug/2010:23:56:54 -0700] "GET www.ardamis.com/2010/02/26/fixing-word-2007-add-in-issues/ HTTP/1.1" 200 23977 "-" "Opera/9.01 (Windows NT 5.0; U; en)"
95.220.185.210 - - [06/Aug/2010:23:56:55 -0700] "POST www.ardamis.com/wp-comments-post-timestamp-3.0.1.php HTTP/1.1" 500 1213 "//ardamis.com/2010/02/26/fixing-word-2007-add-in-issues/" "Opera/9.01 (Windows NT 5.0; U; en)"

The submission is rejected.

Taking the method even further

To take this method even further, one could send a 200 OK header even when the comment is blocked, so the bots never know their mission failed. But this seems unnecessary at this point, as it doesn’t appear that they change their behavior after being sent a 404 error, or that they try again after being sent a 500 error. It also makes it harder to figure out from the access logs which comments were rejected and for what reason.

If you still want to do this, first implement the timestamp method, then make the following modifications.

Sending a 200 header

$comment_timestamp    = trim($_POST['timestamp']);
$submitted_timestamp  = time();

if ( $comment_timestamp == '' ) {
// If the value for $_POST['timestamp'] is an empty string, exit (the form wasn't submitted by the theme's comments.php)
	header('HTTP/1.1 200 OK');
	echo '<p style="text-align:center;">Error: It looks like this form was not submitted by the form at ' . get_option('siteurl') . '.</p>';
	exit;
}
if ( $submitted_timestamp - $comment_timestamp < 60 ) {
// If the form was submitted within 60 seconds of page load, exit
	header('HTTP/1.1 200 OK');
	echo '<p style="text-align:center;">Error: The comment was posted too soon after the page was loaded.  Please press the Back button on your browser and try again in a few seconds.</p>'; 
	exit;
}
// If the form was submitted more than 10 minutes after page load, die
if ( $submitted_timestamp - $comment_timestamp > 600 ) {
	header('HTTP/1.1 200 OK');
	echo '<p style="text-align:center;">Error: You waited too long before posting a comment.</p>';
	exit;
}

One could also write a record to a database each time the old wp-comments-post.php file is requested or any of the timestamp checks block a form submission, and pretty quickly generate a list of IP addresses for a black list. At the same time, one could log which timestamp check caught the spam attempt, which is interesting enough that I’ll probably do it eventually.