Tag Archives: comment spam

Akismet is not enough to stop spam comments

I like Akismet, and it’s undeniably effective in stopping the vast majority of spam, but it adds a huge number of comments to the database and a very small percentage of comments still get through to my moderation queue.

It’s annoying to find comments in my moderation queue, but what I really object to is the thousands of records that are added to the database each month that I don’t see.

In the screenshot below, January through April show very few spam comments being detected by Akismet. This is because I was using my cache-friendly method for reducing WordPress comment spam to block spam comments even before Akismet analyzed them.

In May, I moved hosting providers to asmallorange.com and started with a fresh install of WordPress without implementing my custom spam method, which admittedly was not ideal because it involved changing core files. This left only Akismet between the spammers and my WordPress database. Since that time, instead of 150 or fewer spam comments per month making it into my WordPress database, Akismet was on pace to let in over 10,000.

So, in the spirit of fresh starts and doing things the right way, I created a WordPress plug-in that uses the same timestamp method. It’s actually exactly the same JavaScript and PHP code, just in plug-in form, so it’s not bound to any core files or theme files.

Ardamis in 2012 – new look, more microdata, faster code

Just a few weeks behind schedule, but a long time in the works, I’ve finally pushed the new WordPress theme for Ardamis live. Basic and elegant (I’m trying to establish a trend here), the theme also should outperform its predecessors in both page load times and SEO-potential. The index and archive pages should appear more consistent, and all pages should provide more complete structured data markup (schema.org as well as microformats.org). The comment form has been outfitted with an improved approach to reducing comment spam.

The new theme is pretty light on the graphics, due to increased browser support for and subsequently greater use of CSS3 goodness for box shadows and gradients. I’ve reduced the number of image files to two: a background and a sprites file.

Only half-implemented in the previous theme, the new look, “Joy”, makes much better use of structured data markup, or microdata. Google is absolutely looking for ways to display your pages’ semantic markup in its results, so you may as well get on board.

The frequency of spam comments increased dramatically over the past two months, according to my Akismet stats, so I’ve gone back to the drawing board and developed a better front-line defense against them. The new method should be more opaque to bots that parse JavaScript while still being invisible to human visitors leaving legitimate comments.

In sum, I think Ardamis should be leaner, faster, and smarter (and maybe prettier) in 2012 than ever before.

A cache-friendly method for reducing WordPress comment spam

Update 2015-01-02: About a month ago, in early December, 2014, Google announced that it was working on a new anti-spam API that is intended to replace the traditional CAPTCHA challenge as a method for humans to prove that they are not robots. This is very good news.
This week, I noticed that Akismet is adding a hidden input field to the comment form that contains a timestamp (although the plugin’s PHP puts the initial INPUT element within a P element set to DISPLAY:NONE, when the plugin’s JavaScript updates the value with the current timestamp, the INPUT element jumps outside of that P element). The injected code looks something like this:
<input type=”hidden” id=”ak_js” name=”ak_js” value=”1420256728989″>
I haven’t yet dug into the Akismet code to discover what it’s doing with the timestamp, but I’d be pleased if Akismet is attempting to differentiate humans from bots based on behavior.

Update 2015-01-10: To test the effectiveness of the current version of Akismet, I disabled the anti-spam plugin described in this post on 1/2/2015 and re-enabled it on 1/10/2015. In the span of 8 days, Akismet identified 1,153 spam comments and missed 15 more. These latest numbers continue to support my position that Akismet is not enough to stop spam comments.

In the endless battle against WordPress comment spam, I’ve developed and then refined a few different methods for preventing spam from getting to the database to begin with. My philosophy has always been that a human visitor and a spam bot behave differently (after all, the bots we’re dealing with are not Nexus-6 model androids here), and an effective spam-prevention method should be able to recognize the differences. I also have a dislike for CAPTCHA methods that require a human visitor to prove, via an intentionally difficult test, that they aren’t a bot. The ideal method, I feel, would be invisible to a human visitor, but still accurately identify comments submitted by bots.

Spam on ardamis.com in early 2012 - before and after

Spam on ardamis.com - before and after

A brief history of spam fighting

The most successful and simple method I found was a server-side system for reducing comment spam by using a handshake method involving timestamps on hidden form fields that I implemented in 2007. The general idea was that a bot would submit a comment more quickly than a human visitor, so if the comment was submitted too soon after the post page was loaded, the comment was rejected. A human caught in this trap would be able to click the Back button on the browser, wait a few seconds, and resubmit. This proved to be very effective on ardamis.com, cutting the number of spam comments intercepted by Akismet per day to nearly zero. For a long time, the only problem was that it required modifying a core WordPress file: wp-comments-post.php. Each time WordPress was updated, the core file was replaced. If I didn’t then go back and make my modifications again, I would lose the spam protection until I made the changes. As it became easier to update WordPress (via a single click in the admin panel) and I updated it more frequently, editing the core file became more of a nuisance.

A huge facepalm

When Google began weighting page load times as part of its ranking algorithm, I implemented the WP Super Cache caching plugin on ardamis.com and configured it to use .htaccess and mod_rewrite to serve cache files. Page load times certainly decreased, but the amount of spam detected by Akismet increased. After a while, I realized that this was because the spam bots were submitting comments from static, cached pages, and the timestamps on those pages, which had been generated server-side with PHP, were already minutes old when the page was requested. The form processing script, which normally rejects comments that are submitted too quickly to be written by a human visitor, happily accepted the timestamps. Even worse, a second function of my anti-spam method also rejected comments that were submitted 10 minutes or more after the page was loaded. Of course, most of the visitors were being served cached pages that were already more than 10 minutes old, so even legitimate comments were being rejected. Using PHP to generate my timestamps obviously was not going to work if I wanted to keep serving cached pages.

JavaScript to the rescue

Generating real-time timestamps on cached pages requires JavaScript. But instead of a reliable server clock setting the timestamp, the time is coming from the visitor’s system, which can’t be trusted to be accurate. Merely changing the comment form to use JavaScript to generate the first timestamp wouldn’t work, because verifying a timestamp generated on the client-side against one generated server-side would be disastrous.

Replacing the PHP-generated timestamps with JavaScript-generated timestamps would require substantial changes to the system.

Traditional client-side form validation using JavaScript happens when the form is submitted. If the validation fails, the form is not submitted, and the visitor typically gets an alert with suggestions on how to make the form acceptable. If the validation passes, the form submission continues without bothering the visitor. To get our two timestamps, we can generate a first timestamp when the page loads and compare it to a second timestamp generated when the form is submitted. If the visitor submits the form too quickly, we can display an alert showing the number of seconds remaining until the form can be successfully submitted. This client-side validation should hopefully be invisible to most visitors who choose to leave comments, but at the very least, far less irritating than a CAPTCHA system.

It took me two tries to get it right, but I’m going to discuss the less successful method first to point out its flaws.

Method One (not good enough)

Here’s how the original system flowed.

Generate a first JS timestamp when the page is loaded.
Generate a second JS timestamp when the form is submitted.
Before the form contents are sent to the server, compare the two timestamps, and if enough time has passed, write a pre-determined passcode to a hidden INPUT element, then submit the form.
After the form contents are sent to the server, use server-side logic to verify that the passcode is present and valid.

The problem was that it seemed that certain bots could parse JavaScript enough to drop the pre-determined passcode into the hidden form field before submitting the form, circumventing the timestamps completely and defeating the system.

Because the timestamps were only compared on the client-side, it also failed to adhere to one of the basic tenants of form validation – that the input must be checked on both the client-side and the server-side.

Method Two (better)

Rather than having the server-side validation be merely a check to confirm that the passcode is present, method two compares the timestamps a second time on the server side. Instead of a single hidden input, we now have two – one for each timestamp. This is intended to prevent a bot from figuring out the ultimate validation mechanism by simply parsing the JavaScript. Finally, the hidden fields are not in the HTML of the page when it’s sent to the browser, but are added to the form via jQuery, which makes it easier to implement and may act as another layer of obfuscation.

Generate a first JS timestamp when the page is loaded and write it to a hidden form field.
Generate a second JS timestamp when the form is submitted and write it to a hidden form field.
Before the form contents are sent to the server, compare the two timestamps, and if enough time has passed, submit the form (client-side validation).
On the form processing page, use server-side logic to compare the timestamps a second time (server-side validation).

This timestamp handshake works more like it did in the proven-effective server-side-only method. We still have to pass something from the comment form to the processing script, but it’s not too obvious from the HTML what is being done with it. Furthermore, even if a bot suspects that the timestamps are being compared, there is no telling from the HTML what the threshold is for distinguishing a valid comment from one that is invalid. (The JavaScript could be parsed by a bot, but the server-side check cannot be, making it possible to require a slightly longer amount of time to elapse in order to pass the server-side check.)

The same downside plagued me

For a long time, far longer than I care to admit, I stubbornly continued to modify the core file wp-comments-post.php to provide the server-side processing. But creating the timestamps and parsing them with a plug-in turned out to be a simple matter of two functions, and in June of 2013 I finally got around to doing it the right way.

The code

The plugin, in all its simplicity, is only 100 lines. Just copy this code into a text editor, save it as a .php file (the name isn’t important) and upload it to the /wp-content/plugins directory and activate it. Feel free to edit it however you like to suit your needs.

<?php

/*
Plugin Name: Timestamp Comment Filter
Plugin URI: //ardamis.com/2011/08/27/a-cache-proof-method-for-reducing-comment-spam/
Description: This plugin measures the amount of time between when the post page loads and the comment is submitted, then rejects any comment that was submitted faster than a human probably would or could.
Version: 0.1
Author: Oliver Baty
Author URI: //ardamis.com

    Copyright 2013  Oliver Baty  (email : obbaty@gmail.com)

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
*/

// http://wordpress.stackexchange.com/questions/6723/how-to-add-a-policy-text-just-before-the-comments
function ard_add_javascript(){

	?>
	
<script type="text/javascript" src="//ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){
    ardGenTS1();
});
 
function ardGenTS1() {
    // prepare the form
    $('#commentform').append('<input type="hidden" name="ardTS1" id="ardTS1" value="1" />');
    $('#commentform').append('<input type="hidden" name="ardTS2" id="ardTS2" value="1" />');
    $('#commentform').attr('onsubmit', 'return validate()');
    // set a first timestamp when the page loads
    var ardTS1 = (new Date).getTime();
    document.getElementById("ardTS1").value = ardTS1;
}
 
function validate() {
    // read the first timestamp
    var ardTS1 = document.getElementById("ardTS1").value;
//  alert ('ardTS1: ' + ardTS1);
    // generate the second timestamp
    var ardTS2 = (new Date).getTime();
    document.getElementById("ardTS2").value = ardTS2;
//  alert ('ardTS2: ' + document.getElementById("ardTS2").value);
    // find the difference
    var diff = ardTS2 - ardTS1;
    var elapsed = Math.round(diff / 1000);
    var remaining = 10 - elapsed;
//  alert ('diff: ' + diff + '\n\n elapsed:' + elapsed);
    // check whether enough time has elapsed
    if (diff > 10000) {
        // submit the form
        return true;
    }else{
        // display an alert if the form is submitted within 10 seconds
        alert("This site is protected by an anti-spam feature that requires 10 seconds to have elapsed between the page load and the form submission. \n\n Please close this alert window.  The form may be resubmitted successfully in " + remaining + " seconds.");
        // prevent the form from being submitted
        return false;
    }
}
</script>
	
	<?php
}

add_action('comment_form_before','ard_add_javascript');

// http://wordpress.stackexchange.com/questions/89236/disable-wordpress-comments-api
function ard_parse_timestamps(){

	// Set up the elapsed time, in miliseconds, that is the threshold for determining whether a comment was submitted by a human
	$intThreshold = 10000;
	
	// Set up a message to be displayed if the comment is blocked
	$strMessage = '<strong>ERROR</strong>:  this site uses JavaScript validation to reduce comment spam by rejecting comments that appear to be submitted by an automated method.  Either your browser has JavaScript disabled or the comment appeared to be submitted by a bot.';
	
	$ardTS1 = ( isset($_POST['ardTS1']) ) ? trim($_POST['ardTS1']) : 1;
	$ardTS2 = ( isset($_POST['ardTS2']) ) ? trim($_POST['ardTS2']) : 2;
	$ardTS = $ardTS2 - $ardTS1;
	 
	if ( $ardTS < $intThreshold ) {
	// If the difference of the timestamps is not more than 10 seconds, exit
		wp_die( __($strMessage) );
	}
}
add_action('pre_comment_on_post', 'ard_parse_timestamps');

?>

That’s it. Not so bad, right?

Final thoughts

The screen-shot at the beginning of the post shows the number of spam comments submitted to ardamis.com and detected by Akismet each day from the end of January, 2012, to the beginning of March, 2012. The dramatic drop-off around Jan 20 was when I implemented the method described in this post. The flare-up around Feb 20 was when I updated WordPress and forgot to replace the modified core file for about a week, illustrating one of the hazards of changing core files.

If you would rather not add any hidden form fields to the comment form, you could consider appending the two timestamps to the end of the comment_post_ID field. Because its contents are cast as an integer in wp-comments-post.php when value of the $comment_post_ID variable is set, WordPress won’t be bothered by the extra data at the end of the field, so long as the post ID comes first and is followed by a space. You could then just explode the contents of the comment_post_ID field on the space character, then compare the last two elements of the array.

If you don’t object to meddling with a core file in order to obtain a little extra protection, you can rename the wp-comments-post.php file and change the path in the comment form’s action attribute. I’ve posted logs showing that some bots just try to post spam directly to the wp-comments-post.php file, so renaming that file is an easy way to cut down on spam. Just remember to come back and delete the wp-comments-post.php file each time you update WordPress.

A chart illustrating the reduction in comment spam at ardamis.com

In August, 2010, I described a simple method for dramatically reducing the number of spam comments that are submitted to a WordPress blog. The spam comments are rejected before they are checked by Akismet, so they never make it into the database at all.

Now, a few months later, I’m posting a screenshot of the Akismet stats graph from the WordPress dashboard showing the number of spam comments identified by Akismet before and after the system was implemented.

Akismet stats for August - December, 2010

The spike in spam comments detected around November 3rd occurred after an update to WordPress overwrote my altered wp-comments.php file. I replaced the file and the spam dropped back down to single digits per day.

A massive reduction in the number of spam comments

I’ve written a number of posts on ways to reduce the number of spam comments a blog receives. In this post, I’ll revisit an old method that has almost completely stopped spam comments at ardamis.com before they get to the database.

My first system for blocking WordPress comment spam was an overly complex combination of JavaScript and a challenge-response to test that the comment was being submitted by a person. The value of the action attribute in the form was not in the HTML when the page was loaded, so the form couldn’t be immediately submitted, then JavaScript was used to write the path to a renamed wp-comments-post.php file only after a certain user action was performed. I was never really satisfied with it. I didn’t like relying on JavaScript, I had doubts that any human being (meaning of any mental or physical capacity, speaking any language, etc.) could correctly answer the question, and I was concerned that any obstacle to submitting a form discourages legitimate commenting.

A few months later, I posted a simpler timestamp method for reducing WordPress comment spam that compares two timestamps and then rejects any form submission that occurrs within 60 seconds of the post page being loaded. The visitor wasn’t bothered by an additional form field solely for anti-spam and there was no JavaScript involved.

Both methods were very effective at blocking spam before it made it to the database. In the five months leading up to the implementation of the first method, Akismet was catching an average of 1418 spam comments per month. In the first five months after these methods were put in place, Akismet was catching only 54 spam comments per month. But I also noticed a reduction in legitimate comments, from an average of 26 per month to 20 per month, which led me to suspect that real visitors attempting to leave comments were being discouraged from doing so.

The timestamp method required changing a core file, which was overwritten each time WordPress was updated. As time went on, I forgot to replace the file after upgrading WordPress, so the protection was lost and I once again had only Akismet blocking spam. A few months later, while doing work on the database in an attempt to speed up WordPress, I happened to check my historical stats and found that Akismet had detected 4,144 comments in July, 2010. Yikes. It was time to revisit these old methods.

At 2:30 AM on August 1, 2010, I again implemented my timestamp method, but this time I also renamed the wp-comments-post.php file that processes the form. I changed my theme’s comments.php file to submit the form to the new page, deleted the wp-comments-post.php file from the server and tested to make sure that comments could still be submitted. And then I waited to see what would happen.

The effect was pretty amazing. The spam had almost completely stopped.

My Akismet stats look like this:

Date	Spam
7.30.10	192
7.31.10	196
8.1.10	32
8.2.10	0
8.5.10	4
8.8.10	4
8.10.10	4
8.11.10	4
8.13.10	0
8.14.10	0

(I don’t know why so many dates in August are skipped in the log, but whatever.)

Fast, but only partial protection

The quick and easy way to reduce the number of spam comments that your WordPress blog receives is to merely change the location of the comment form processing script.

Rename wp-comments-post.php to anything else. I like using a string of random hexadecimal characters, like: z1t0zVGuaCZEi.php.
Edit your current theme’s comments.php so that the form is submitted to this new file.
Upload these files to their respective directories, then delete the wp-comments-post.php file from your server.

This method works well to stop spam submitted by bots that assume the comment form processing script used by WordPress is always at the same location. More advanced bots will read the actual location of the file from the action attribute of the form element, but that can be countered by using either the JavaScript or timestamp method.

Access log analysis

To illustrate the effectiveness of the renamed wp-comments-post file + timestamp check, below are some events from my 06 August 2010 access log.

Bot defeated by renamed file alone

Here is a form submission to the non-existent wp-comments-post file that occurs 2 seconds after the post page is requested.

173.242.112.44 - - [06/Aug/2010:23:21:37 -0700] "GET www.ardamis.com/2007/07/12/defeating-contact-form-spam/ HTTP/1.0" 200 32530 "http://www.google.com" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50"
173.242.112.44 - - [06/Aug/2010:23:21:39 -0700] "POST www.ardamis.com/wp-comments-post.php HTTP/1.0" 404 15529 "//ardamis.com/2007/07/12/defeating-contact-form-spam/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50"

The bot is sent a 404 HTTP status code, which is widely understood to mean that the page isn’t there and you can stop asking for it. But that doesn’t stop this bot! Two minutes later, it’s back at another page, trying again.

173.242.112.44 - - [06/Aug/2010:23:23:01 -0700] "GET www.ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/ HTTP/1.0" 200 101259 "http://www.google.com" "Opera/9.64(Windows NT 5.1; U; en) Presto/2.1.1"
173.242.112.44 - - [06/Aug/2010:23:23:05 -0700] "POST www.ardamis.com/wp-comments-post.php HTTP/1.0" 404 15529 "//ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/" "Opera/9.64(Windows NT 5.1; U; en) Presto/2.1.1"

Again, it gets a 404 back. Some bots never learn.

Bot defeated by timestamp check

Here is a form submission to the renamed wp-comments-post file that occurs 4 seconds after the post page is requested.

91.201.66.6 - - [06/Aug/2010:23:30:41 -0700] "GET www.ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/ HTTP/1.1" 200 21787 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)"
91.201.66.6 - - [06/Aug/2010:23:30:45 -0700] "POST www.ardamis.com/wp-comments-post-timestamp-3.0.1.php HTTP/1.1" 500 1227 "//ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)"

The 500 HTTP header indicates that this submission was denied and the comment never made it to the database. This access log doesn’t indicate which check stopped the POST (eg: the email validation or the timestamp function), but my money is on the timestamp.

Here’s another form submission to the renamed wp-comments-post file that occurs one second after the post page is requested. Speed reader or bot?

95.220.185.210 - - [06/Aug/2010:23:56:54 -0700] "GET www.ardamis.com/2010/02/26/fixing-word-2007-add-in-issues/ HTTP/1.1" 200 23977 "-" "Opera/9.01 (Windows NT 5.0; U; en)"
95.220.185.210 - - [06/Aug/2010:23:56:55 -0700] "POST www.ardamis.com/wp-comments-post-timestamp-3.0.1.php HTTP/1.1" 500 1213 "//ardamis.com/2010/02/26/fixing-word-2007-add-in-issues/" "Opera/9.01 (Windows NT 5.0; U; en)"

The submission is rejected.

Taking the method even further

To take this method even further, one could send a 200 OK header even when the comment is blocked, so the bots never know their mission failed. But this seems unnecessary at this point, as it doesn’t appear that they change their behavior after being sent a 404 error, or that they try again after being sent a 500 error. It also makes it harder to figure out from the access logs which comments were rejected and for what reason.

If you still want to do this, first implement the timestamp method, then make the following modifications.

Sending a 200 header

$comment_timestamp    = trim($_POST['timestamp']);
$submitted_timestamp  = time();

if ( $comment_timestamp == '' ) {
// If the value for $_POST['timestamp'] is an empty string, exit (the form wasn't submitted by the theme's comments.php)
	header('HTTP/1.1 200 OK');
	echo '<p style="text-align:center;">Error: It looks like this form was not submitted by the form at ' . get_option('siteurl') . '.</p>';
	exit;
}
if ( $submitted_timestamp - $comment_timestamp < 60 ) {
// If the form was submitted within 60 seconds of page load, exit
	header('HTTP/1.1 200 OK');
	echo '<p style="text-align:center;">Error: The comment was posted too soon after the page was loaded.  Please press the Back button on your browser and try again in a few seconds.</p>'; 
	exit;
}
// If the form was submitted more than 10 minutes after page load, die
if ( $submitted_timestamp - $comment_timestamp > 600 ) {
	header('HTTP/1.1 200 OK');
	echo '<p style="text-align:center;">Error: You waited too long before posting a comment.</p>';
	exit;
}

One could also write a record to a database each time the old wp-comments-post.php file is requested or any of the timestamp checks block a form submission, and pretty quickly generate a list of IP addresses for a black list. At the same time, one could log which timestamp check caught the spam attempt, which is interesting enough that I’ll probably do it eventually.

Using timestamps to reduce WordPress comment spam

Update 8.27.11: The method described in this post uses PHP to generate the timestamps. If your site is using a caching plugin, the timestamps in the HTML will be stale, and this method will not work. Please see my updated post at A cache-friendly method for reducing WordPress comment spam for a new method using JavaScript for sites that use page caching.

In this post, I’ll explain how to reduce the amount of comment spam your WordPress blog receives by using an unobtrusive ‘handshake’ between the two files necessary for a valid comment submission to take place. I’ve written a few different articles on reducing comment spam by means of a challenge response test that the visitor must complete before submitting a comment, but I’m now looking for ways to achieve the same results while keeping the anti-spam method invisible to the visitor.

I’m a big fan of Akismet, but I also want to block as much spam as possible before it is caught by Akisment in order to reduce the number of database entries.

One thing this method does not do is rename and hide the path to the form processing script, but it makes that technique obsolete, anyway.

In the timestamp handshake method, a first timestamp is generated and written as a hidden input field when the post page loads. When the comment is submitted, a second timestamp is generated by the comment-processing script and it and the page-load timestamp are saved as variables. If the page-load timestamp variable is blank, which should be the case if the spambot uses any other page to populate the comment, the script will die. The page-load timestamp is then subtracted from the comment-submission timestamp. If the comment was submitted less than 60 seconds after the post page was loaded, the script dies with a descriptive error message. Hopefully, this will separate the bots’ comments from those left by thoughtful human visitors who have taken the time to read your post. If a human visitor does happen to submit a comment within 60 seconds of the page loading, he or she can click his or her browser’s back button and try resubmitting the comment again in a few seconds.

One drawback is that this method does involve editing a core file – wp-comments-post.php. You’ll have to re-edit it each time you upgrade WordPress, which is a nuisance, I know. The good thing is that if you forget to do this, people can still comment – you just won’t have the anti-spam protection.

Note that the instructions in the following steps are based on the code in WordPress version 2.3 and the Kubrick theme included with that release. You may need to adjust for your version of WordPress.

Step 1 – Add the hidden timestamp input field to the comment form

Open the comments.php file in your current theme’s folder and find the following lines:

<p><textarea name="comment" id="comment" cols="100%" rows="10" tabindex="4"></textarea></p>

<p><input name="submit" type="submit" id="submit" tabindex="5" value="Submit Comment" />

Add the following line between them:

<p><input type="hidden" name="timestamp" id="timestamp" value="<?php echo time(); ?>" size="22" /></p>

Step 2 – Modify the wp-comments-post.php file to create the second timestamp and perform the comparison

Open wp-comments-post.php and find the lines:

$comment_author       = trim(strip_tags($_POST['author']));
$comment_author_email = trim($_POST['email']);
$comment_author_url   = trim($_POST['url']);
$comment_content      = trim($_POST['comment']);

Immediately after them, add the following lines:

$comment_timestamp    = trim($_POST['timestamp']);
$submitted_timestamp  = time();

if ( $comment_timestamp == '' )
	wp_die( __('Hello, spam bot!') );
	
if ( $submitted_timestamp - $comment_timestamp < 60 )
	wp_die( __('Error: you must wait at least 1 minute before posting a comment.') );

That’s it; you’re done.

Credits

Thanks to Jonathan Bailey for suggesting the handshake in his post at http://www.plagiarismtoday.com/2007/07/24/wordpress-and-comment-spam/.

Defeating WordPress comment spam

Comment spam comes from humans who are paid to post it and robots/scripts that do it automatically. The majority of spam comes from the bots. There’s very little one can do to defend against a determined human being, but bots tend to behave predictably, and that allows us to develop countermeasures.

From my observations, it seems that the spambots are first given a keyword phrase to hunt down. They go through the search engine results for pages with that keyword phrase, and follow the link to each page. If the page happens to be a WordPress post, they pass their spammy content to the comment form. Apparently, they do this in a few different ways. Some bots seem to inject their content directly into the form processing agent, a file in the blog root named wp-comments-post.php, using the WordPress default form field IDs. Other bots seem to fill in any fields they come across before submitting the form. Still others seem to fill out the form, but ignore any unexpected text input fields. All of these behaviors can be used against the spammers.

One anti-spam technique that has been used for years is to rename the script that handles the form processing. If you look at the HTML of a WordPress post with comments enabled, you’ll see a line that reads:

<form action="http://yourdomain.com/wp-comments-post.php" method="post" id="commentform">

The ‘wp-comments-post.php’ file handles the comment form processing, and a good amount of spam can be avoided by simply renaming it. Many bots will try to pass their content to directly to that script, even if it no longer exists, to no effect.

More sophisticated bots will look through the HTML for the URL of the form processing script, and in doing so will learn the URL of the newly renamed script and pass their contents to that script instead. The trick is to prevent these smarter bots from discovering the URL of the new script. Because it seems that the bots aren’t looking through external JavaScript files yet, that’s where we will hide the URL. (If you do use this technique, it would be very considerate to tell your visitors that JavaScript is required to post comments.)

Step 1

Rename the wp-comments-post.php file to anything else. Using a string of random hexadecimal characters would be ideal. Once you’ve renamed the file, enter the address of the file in your browser. The page should be blank; if you get a 404 error, something is wrong. Make a note of that address, because you’ll need it later. Verify that the wp-comments-post.php file is empty or is no longer on your server.

(Because I was curious about how many bots were hitting the wp-comments-post.php file directly, I replaced the code with a hitcounter. Sure enough, bots are still hitting the file directly, even though there is no longer any path leading to it.)

Step 2

Open up the ‘comments.php’ file in your theme directory. Find the line:

<form action="http://yourdomain.com/wp-comments-post.php" method="post" id="commentform">

and change the value of the action attribute to a number sign (or pound sign, or hash), like so:

<form action="#" method="post" id="commentform">

Any bots that come to the page and search for the path to your comment processing script will just see the hash, so they will never discover the URL to the real script. This change also means that if a bot or a visitor tries to submit the form, the form will fail, because a WordPress single post page isn’t designed to process forms. We want the bots to fail, but we’ll need to put things right for humans.

If you are tempted to designate a separate page for the action value, note that the only people likely to ever see this page are visitors without JavaScript enabled who fill out the form.

Step 3

Create a new JavaScript file with the following code.

function commentScriptReveal() {
	// enter the URL of your renamed wp-comments-post.php file below
	var scriptPath = "http://yourdomain.com/renamed-wp-comments-post.php";
	document.getElementById("commentform").setAttribute("action", scriptPath);
}

Enter the address of your renamed file as the value for the variable scriptPath. The function commentScriptReveal, when called, will find the element with the ID ‘commentform’ (that’s the comment form) and change its action attribute to the URL of the renamed file, allowing the form to be successfully sent to the processing agent.

Save the file as ‘commentrevealer.js’ and upload it to the /scripts/ directory in your blog’s root. Add the script to your theme’s header.php file:

<script src="<?php echo bloginfo('url'); ?>/scripts/commentrevealer.js" type="text/javascript"></script>

Now we just need to decide how to call the commentScriptReveal function.

Step 4

The ideal method of calling the function would be the one where the human visitor always calls the function, and the bot never calls it. To do this, we need to know something about how the bots work.

Step 4a — For spam bots that ignore unexpected text input fields:

If the bots ignore unexpected text input fields, we can simply add a field, label it ‘required’, and attach the script revealer to that field with one of the following event handlers:

onchange triggered when the user changes the content of a field
onkeypress triggered when a keyboard key is pressed or held down
onkeydown triggered when a keyboard key is pressed
onkeyup triggered when a keyboard key is released
onfocus triggered when an element gets focus
onblur triggered when an element loses focus

I’m intentionally vague about how to trigger the function commentScriptReveal because this technique will be efficacious longer if different people use different events. Furthermore, the text input field doesn’t necessarily need to do anything, its contents will just be discarded when the form is processed. In fact, it doesn’t even need to be a text input field. It can be any form control—a button, a checkbox, a radio button, a menu, etc. We just need human visitors to interact with it somehow. Those bots that skip over the control won’t trigger the revealer event, and your visitors (who always follow directions) will.

If everyone goes about implementing this method in a slightly different way, the spammers should find it much more difficult to counter.

For further reading on JavaScript events: QuirksMode – Javascript – Introduction to Events.

Step 4b — For spam bots that add text to every input field they come across:

If the bots are hitting every text input field with some text, follow Step 4a, and then create a second JavaScript file, named ‘commentconcealer.js’, with the following code:

function commentScriptConceal() {
	// enter the URL of your renamed wp-comments-post.php file below
	var scriptPath = "#";
	document.getElementById("commentform").setAttribute("action", scriptPath);
}

The function commentScriptConceal re-rewrites the action attribute back to “#“.

Upload the file and add the script to your theme’s header.php file:

<script src="<?php echo bloginfo('url'); ?>/scripts/commentconcealer.js" type="text/javascript"></script>

Add another text input field somewhere below the one you added in Step 4a. Hide this field from visitors with {display: none;}. Call the function with an onfocus (or onblur, etc.) event on the second input field:

<p style="display: none;"><input type="text" name="reconceal" id="reconceal" value="" size="22" onfocus="return commentScriptConceal()" />
<label for="reconceal"><small>OMG Don't Touch This Field!?!?</small></label></p>

The legitimate visitors will never trigger this, but any bot that interacts with every field on a page will.

Step 5

If you chose to use a text input field in Step 4a, consider making that a challenge-response test. It’s always a good idea to use a server-side check to back up a client-side check. I like to ask the question “What color is an orange?” as a tip of the hat to Eric Meyer. This challenge-response test can be integrated into wp-comments-post.php, so that if the user fails the test, the form submission dies in the same way it would if a required field were left blank.

For this example, let’s ask the question “What color is Kermit the Frog?” The answer, of course, is green.

Open your wp-comments-post.php file, and (in WP version 2.2.2) somewhere around line 31 add:

$comment_verify = strtolower(trim($_POST['verify']));

This will trim any whitespace from the answer to the challenge question, and convert the characters to lowercase.

Around line 45, find the lines:

} else {
	if ( get_option('comment_registration') )
		wp_die( __('Sorry, you must be logged in to post a comment.') );
}

And add the following lines to them, as so:

} else {
	if ( get_option('comment_registration') )
		wp_die( __('Sorry, you must be logged in to post a comment.') );
	if ( $comment_verify != 'green' )
		wp_die( __('Sorry, you must correctly answer the "Kermit" question to post a comment.') );
}

This will check the visitor’s answer against the correct answer, and if they don’t match, the script will stop and the comment won’t be submitted. The visitor can hit the back button to change the answer.

Now we need to add the challenge question to the theme file comments.php. Find the lines:

<p><input type="text" name="url" id="url" value="<?php echo $comment_author_url; ?>" size="22" tabindex="3" />
<label for="url"><small>Website</small></label></p>

And right below them, add the following lines:

<p><input type="text" name="verify" id="verify" value="" size="22" tabindex="4" />
<label for="verify"><small>What color is Kermit the Frog? (anti-spam) (required)</small></label></p>

(You may need to update all the tabindex numbers after you add this field.)

That’s it, your visitors will now have to correctly answer the Kermit the Frog question to submit the form.

Further customization

The methods described here just scratch the surface of what can be done to obfuscate the comment handling script. For example, the URL could be broken up into parts and saved as multiple variables. It could have a name comprised of numbers, and the commentScriptReveal JavaScript could perform math to assemble it.

I can’t imagine I’m the first person to come up with this, but… the user could be required to complete a challenge-response question, the correct answer to which would be used as part of the name of the script—the URL to the script doesn’t exist anywhere until the visitor creates it.

But, there are some people who don’t want to impose even a minor inconvenience on their visitors. If you don’t like challenge-response tests, what about using JavaScript to invisibly check the screen resolution of the user agent? And we haven’t even considered using cookies yet.

Credits

Many thanks to Jeff Barr for demonstrating how to put a challenge-response test into the wp-comments-post.php file in his post WordPress Comment Verification (With Source Code). I had been using only JavaScript for validation up until that point, shame on me.

Many thanks also to Will Bontrager for writing a provocative explanation of how to temporarily hide the form processing script using JavaScript in Spamming You Through Your Own Forms.

Huge thanks to Eric Meyer, who wrote a pre-plugin, challenge-response test script called WP-Gatekeeper for a very early version of WordPress.

I’ve described two similar methods for defeating contact form spam by hiding the webmail script in an eariler post.