In the endless battle against WordPress comment spam, I’ve developed and then refined a few different methods for preventing spam from getting to the database to begin with. My philosophy has always been that a human visitor and a spam bot behave differently (after all, we’re not dealing with Nexus-6 model androids here), and an effective spam-prevention method should be able to recognize the differences. I also have a dislike for CAPTCHA methods that require a human visitor to prove, via an intentionally difficult test, that they aren’t a bot. The ideal method, I feel, would be invisible to a human visitor, but still accurately identify comments submitted by bots.
A history of spam fighting
The most successful and simple method I found was a server-side system for reducing comment spam by using a handshake method involving timestamps on hidden form fields. The general idea was that a bot would submit a comment more quickly than a human visitor, so if the comment was submitted too soon after the page was loaded, it was rejected. A human caught in this trap would be able to click the Back button on the browser to resubmit. This had proven to be very effective on ardamis.com, cutting the number of spam comments intercepted by Akismet per day to nearly zero. For a long time, the only problem was that it required modifying a core WordPress file, wp-comments-post.php. Each time WordPress was updated, the core file was replaced. If I didn’t then go back and make my modifications again, I would lose the spam protection until I made the changes. As it became easier to update WordPress (via the admin panel) and I updated it more frequently, editing the core file became more of a nuisance.
A huge facepalm
When Google began weighting page load times as part of its ranking algorithm, I implemented the WP Super Cache caching plugin on ardamis.com and configured it to use .htaccess and mod_rewrite to serve cache files. Page load times certainly decreased, but the amount of spam detected by Akismet increased. After a while, I realized that this was because the spam bots were submitting comments from static, cached pages, and the timestamps on those pages, which had been generated server-side with PHP, were already minutes old when the page was requested. The form processing script, which normally rejects comments that are submitted too quickly to be written by a human visitor, happily accepted the timestamps. Even worse, a second function of my anti-spam method also rejected comments that were submitted 10 minutes or more after the page was loaded. Of course, most of the visitors were being served cached pages that were already more than 10 minutes old, so even legitimate comments were being rejected. Using PHP to generate my timestamps obviously was not going to work if I wanted to keep serving cached pages.
JavaScript to the rescue
Generating real-time timestamps on cached pages requires JavaScript. But instead of a reliable server clock setting the timestamp, the time is coming from the visitor’s system, which can’t be trusted to be accurate. Merely changing the comment form to use JavaScript to generate the first timestamp wouldn’t work, because verifying a timestamp generated on the client-side against one generated with a server-side language would be disastrous.
Replacing the PHP-generated timestamps with JavaScript-generated timestamps would require substantial changes to the system.
Traditional client-side form validation using JavaScript happens when the form is submitted. If the validation fails, the form is not submitted, and the visitor typically gets an alert with suggestions on how to make the form acceptable. If the validation passes, the form submission continues without bothering the visitor. To get our two timestamps, we can generate a first timestamp when the page loads and compare it to a second timestamp generated when the form is submitted. If the visitor submits the form too quickly, we can display an alert showing the number of seconds remaining until the form can be successfully submitted. This should hopefully be invisible to most visitors who choose to leave comments, but at the very least, far less irritating than a CAPTCHA system.
It took me two tries to get it right, but I’m going to discuss the less successful method first to point out its flaws.
Method One (not good enough)
Here’s how the original system flowed.
- Generate a first JS timestamp when the page is loaded.
- Generate a second JS timestamp when the form is submitted.
- Before the form is submitted, compare the two, and if enough time has passed, write a pre-determined passcode to a hidden INPUT element, then submit the form.
- On the form processing page, use server-side logic to verify that the passcode is present and valid.
The problem was that it seemed that certain bots could parse JavaScript enough to drop the pre-determined passcode into the hidden form field before submitting the form, circumventing the timestamps completely and defeating the system.
It also failed to adhere to one of the basic tenants of form validation – that the input must be checked on both the client-side and the server-side.
Method Two (better)
Rather than having the server-side validation be merely a check to confirm that the passcode is present, method two goes back to comparing the timestamps a second time on the server side. Instead of a single hidden input, we now have two – one for each timestamp. This is intended to prevent a bot from figuring out the ultimate validation mechanism by simply parsing the JavaScript. Finally, the hidden fields are added to the form via jQuery, which makes it easier to implement and may act as another layer of obfuscation.
- Generate a first JS timestamp when the page is loaded and write it to a hidden form field.
- Generate a second JS timestamp when the form is submitted and write it to a hidden form field.
- Before the form is submitted, compare the two, and if enough time has passed, submit the form (client-side validation).
- On the form processing page, use server-side logic to compare the timestamps a second time (server-side validation).
The timestamp handshake works more like it did in the server-side-only method. We still have to pass something from the comment form to the processing script, but it’s not too obvious from the HTML what is being done with it.
The same downside plagues me
Unfortunately, if we want to have any server-side validation at all, and we do, the core file wp-comments-post.php will still have to be modified. In my experience, the system is only modestly effective using just client-side validation.
The code
Two files must be modified to implement the validation.
File 1: The theme’s comments.php file (older themes) or wp-includescomment-template.php (newer themes)
Your comment form lives somewhere. My theme is based on Kubrick, the old default WordPress theme, and my comment form is in my theme folder, in a file named comments.php. If your theme is newer and based on the current default theme, twentyeleven, the form is in wp-includescomment-template.php. If your theme isn’t based on either of these, all bets are off. I know it’s confusing. Sorry.
Add the JavaScript that creates and populates the timestamp fields. Be sure to confirm that your comment form has an ID of commentform. I’m using jQuery to help fire functions when the page loads.
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){
ardGenTS1();
});
function ardGenTS1() {
// prepare the form
$('#commentform').append('<input type="hidden" name="ardTS1" id="ardTS1" value="1" />');
$('#commentform').append('<input type="hidden" name="ardTS2" id="ardTS2" value="1" />');
$('#commentform').attr('onsubmit', 'return validate()');
// set a first timestamp when the page loads
var ardTS1 = (new Date).getTime();
document.getElementById("ardTS1").value = ardTS1;
}
function validate() {
// read the first timestamp
var ardTS1 = document.getElementById("ardTS1").value;
// alert ('ardTS1: ' + ardTS1);
// generate the second timestamp
var ardTS2 = (new Date).getTime();
document.getElementById("ardTS2").value = ardTS2;
// alert ('ardTS2: ' + document.getElementById("ardTS2").value);
// find the difference
var diff = ardTS2 - ardTS1;
var elapsed = Math.round(diff / 1000);
var remaining = 10 - elapsed;
// alert ('diff: ' + diff + 'nnelapsed:' + elapsed);
// check whether enough time has elapsed
if (diff > 10000) {
// submit the form
return true;
}else{
// display an alert if the form is submitted within 10 seconds
alert("This site is protected by an anti-spam feature that requires 10 seconds to have elapsed between the page load and the form submission.nnPlease close this alert window. The form may be resubmitted successfully in " + remaining + " seconds.");
// prevent the form from being submitted
return false;
}
}
</script>
File 2: The wp-comments-post.php file
The wp-comments-post.php file lives in the root of WordPress and handles the form processing. We need to add a few lines that check the contents of our new validation input field.
Somewhere after line 53 or so (where $comment_content is defined), insert the following code.
$ardTS1 = ( isset($_POST['ardTS1']) ) ? trim($_POST['ardTS1']) : 1;
$ardTS2 = ( isset($_POST['ardTS2']) ) ? trim($_POST['ardTS2']) : 2;
$ardTS = $ardTS2 - $ardTS1;
if ( $ardTS < 10000 ) {
// If the difference of the timestamps is not more than 10 seconds, exit
wp_die( __('<strong>ERROR</strong>: This site uses JavaScript validation to reduce comment spam. Either your browser has JavaScript disabled, or the comment was not legitimately submitted.') );
}
That’s it. Not so bad, right?
Final thoughts
One advantage to this method over the old PHP-only method is that even if the core file is replaced and the server-side validation is lost, the client-side validation continues to work, providing some measure of protection. The screen-shot at the beginning of the post shows the number of spam comments submitted to ardamis.com and detected by Akismet each day from the end of January, 2012, to the beginning of March, 2012. The dramatic drop-off around Jan 20 was when I implemented the method described in this post. The flare-up around Feb 20 was when I updated WordPress and forgot to replace the modified core file for about a week.
Now, for a little extra protection, you can rename the wp-comments-post.php file and change the path in the comment form’s action attribute. I’ve posted logs showing that some bots just try to post spam directly to the wp-comments-post.php file, so renaming that file is an easy way to cut down on spam. Just remember to come back and delete the wp-comments-post.php file each time you update WordPress.

