Tag Archives: spam

In the endless battle against WordPress comment spam, I’ve developed and then refined a few different methods for preventing spam from getting to the database to begin with. My philosophy has always been that a human visitor and a spam bot behave differently (after all, we’re not dealing with Nexus-6 model androids here), and an effective spam-prevention method should be able to recognize the differences. I also have a dislike for CAPTCHA methods that require a human visitor to prove, via an intentionally difficult test, that they aren’t a bot. The ideal method, I feel, would be invisible to a human visitor, but still accurately identify comments submitted by bots.

Spam on ardamis.com - before and after

Spam on ardamis.com - before and after

A history of spam fighting

The most successful and simple method I found was a server-side system for reducing comment spam by using a handshake method involving timestamps on hidden form fields. The general idea was that a bot would submit a comment more quickly than a human visitor, so if the comment was submitted too soon after the page was loaded, it was rejected. A human caught in this trap would be able to click the Back button on the browser to resubmit. This had proven to be very effective on ardamis.com, cutting the number of spam comments intercepted by Akismet per day to nearly zero. For a long time, the only problem was that it required modifying a core WordPress file, wp-comments-post.php. Each time WordPress was updated, the core file was replaced. If I didn’t then go back and make my modifications again, I would lose the spam protection until I made the changes. As it became easier to update WordPress (via the admin panel) and I updated it more frequently, editing the core file became more of a nuisance.

A huge facepalm

When Google began weighting page load times as part of its ranking algorithm, I implemented the WP Super Cache caching plugin on ardamis.com and configured it to use .htaccess and mod_rewrite to serve cache files. Page load times certainly decreased, but the amount of spam detected by Akismet increased. After a while, I realized that this was because the spam bots were submitting comments from static, cached pages, and the timestamps on those pages, which had been generated server-side with PHP, were already minutes old when the page was requested. The form processing script, which normally rejects comments that are submitted too quickly to be written by a human visitor, happily accepted the timestamps. Even worse, a second function of my anti-spam method also rejected comments that were submitted 10 minutes or more after the page was loaded. Of course, most of the visitors were being served cached pages that were already more than 10 minutes old, so even legitimate comments were being rejected. Using PHP to generate my timestamps obviously was not going to work if I wanted to keep serving cached pages.

JavaScript to the rescue

Generating real-time timestamps on cached pages requires JavaScript. But instead of a reliable server clock setting the timestamp, the time is coming from the visitor’s system, which can’t be trusted to be accurate. Merely changing the comment form to use JavaScript to generate the first timestamp wouldn’t work, because verifying a timestamp generated on the client-side against one generated with a server-side language would be disastrous.

Replacing the PHP-generated timestamps with JavaScript-generated timestamps would require substantial changes to the system.

Traditional client-side form validation using JavaScript happens when the form is submitted. If the validation fails, the form is not submitted, and the visitor typically gets an alert with suggestions on how to make the form acceptable. If the validation passes, the form submission continues without bothering the visitor. To get our two timestamps, we can generate a first timestamp when the page loads and compare it to a second timestamp generated when the form is submitted. If the visitor submits the form too quickly, we can display an alert showing the number of seconds remaining until the form can be successfully submitted. This should hopefully be invisible to most visitors who choose to leave comments, but at the very least, far less irritating than a CAPTCHA system.

It took me two tries to get it right, but I’m going to discuss the less successful method first to point out its flaws.

Method One (not good enough)

Here’s how the original system flowed.

  1. Generate a first JS timestamp when the page is loaded.
  2. Generate a second JS timestamp when the form is submitted.
  3. Before the form is submitted, compare the two, and if enough time has passed, write a pre-determined passcode to a hidden INPUT element, then submit the form.
  4. On the form processing page, use server-side logic to verify that the passcode is present and valid.

The problem was that it seemed that certain bots could parse JavaScript enough to drop the pre-determined passcode into the hidden form field before submitting the form, circumventing the timestamps completely and defeating the system.

It also failed to adhere to one of the basic tenants of form validation – that the input must be checked on both the client-side and the server-side.

Method Two (better)

Rather than having the server-side validation be merely a check to confirm that the passcode is present, method two goes back to comparing the timestamps a second time on the server side. Instead of a single hidden input, we now have two – one for each timestamp. This is intended to prevent a bot from figuring out the ultimate validation mechanism by simply parsing the JavaScript. Finally, the hidden fields are added to the form via jQuery, which makes it easier to implement and may act as another layer of obfuscation.

  1. Generate a first JS timestamp when the page is loaded and write it to a hidden form field.
  2. Generate a second JS timestamp when the form is submitted and write it to a hidden form field.
  3. Before the form is submitted, compare the two, and if enough time has passed, submit the form (client-side validation).
  4. On the form processing page, use server-side logic to compare the timestamps a second time (server-side validation).

The timestamp handshake works more like it did in the server-side-only method. We still have to pass something from the comment form to the processing script, but it’s not too obvious from the HTML what is being done with it.

The same downside plagues me

Unfortunately, if we want to have any server-side validation at all, and we do, the core file wp-comments-post.php will still have to be modified. In my experience, the system is only modestly effective using just client-side validation.

The code

Two files must be modified to implement the validation.

File 1: The theme’s comments.php file (older themes) or wp-includescomment-template.php (newer themes)

Your comment form lives somewhere. My theme is based on Kubrick, the old default WordPress theme, and my comment form is in my theme folder, in a file named comments.php. If your theme is newer and based on the current default theme, twentyeleven, the form is in wp-includescomment-template.php. If your theme isn’t based on either of these, all bets are off. I know it’s confusing. Sorry.

Add the JavaScript that creates and populates the timestamp fields. Be sure to confirm that your comment form has an ID of commentform. I’m using jQuery to help fire functions when the page loads.

<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){
	ardGenTS1();
});

function ardGenTS1() {
	// prepare the form
	$('#commentform').append('<input type="hidden" name="ardTS1" id="ardTS1" value="1" />');
	$('#commentform').append('<input type="hidden" name="ardTS2" id="ardTS2" value="1" />');
	$('#commentform').attr('onsubmit', 'return validate()');
	// set a first timestamp when the page loads
	var ardTS1 = (new Date).getTime();
	document.getElementById("ardTS1").value = ardTS1;
}

function validate() {
	// read the first timestamp
	var ardTS1 = document.getElementById("ardTS1").value;
//	alert ('ardTS1: ' + ardTS1);
	// generate the second timestamp
	var ardTS2 = (new Date).getTime();
	document.getElementById("ardTS2").value = ardTS2;
//	alert ('ardTS2: ' + document.getElementById("ardTS2").value);
	// find the difference
	var diff = ardTS2 - ardTS1;
	var elapsed = Math.round(diff / 1000);
	var remaining = 10 - elapsed;
//	alert ('diff: ' + diff + 'nnelapsed:' + elapsed);
	// check whether enough time has elapsed
	if (diff > 10000) {
		// submit the form
		return true;
	}else{
		// display an alert if the form is submitted within 10 seconds
		alert("This site is protected by an anti-spam feature that requires 10 seconds to have elapsed between the page load and the form submission.nnPlease close this alert window.  The form may be resubmitted successfully in " + remaining + " seconds.");
		// prevent the form from being submitted
		return false;
	}
}
</script>

File 2: The wp-comments-post.php file

The wp-comments-post.php file lives in the root of WordPress and handles the form processing. We need to add a few lines that check the contents of our new validation input field.

Somewhere after line 53 or so (where $comment_content is defined), insert the following code.

$ardTS1 = ( isset($_POST['ardTS1']) ) ? trim($_POST['ardTS1']) : 1;
$ardTS2 = ( isset($_POST['ardTS2']) ) ? trim($_POST['ardTS2']) : 2;
$ardTS = $ardTS2 - $ardTS1;

if ( $ardTS < 10000 ) {
// If the difference of the timestamps is not more than 10 seconds, exit
    wp_die( __('<strong>ERROR</strong>:  This site uses JavaScript validation to reduce comment spam.  Either your browser has JavaScript disabled, or the comment was not legitimately submitted.') );
}

That’s it. Not so bad, right?

Final thoughts

One advantage to this method over the old PHP-only method is that even if the core file is replaced and the server-side validation is lost, the client-side validation continues to work, providing some measure of protection. The screen-shot at the beginning of the post shows the number of spam comments submitted to ardamis.com and detected by Akismet each day from the end of January, 2012, to the beginning of March, 2012. The dramatic drop-off around Jan 20 was when I implemented the method described in this post. The flare-up around Feb 20 was when I updated WordPress and forgot to replace the modified core file for about a week.

Now, for a little extra protection, you can rename the wp-comments-post.php file and change the path in the comment form’s action attribute. I’ve posted logs showing that some bots just try to post spam directly to the wp-comments-post.php file, so renaming that file is an easy way to cut down on spam. Just remember to come back and delete the wp-comments-post.php file each time you update WordPress.

In August, 2010, I described a simple method for dramatically reducing the number of spam comments that are submitted to a WordPress blog. The spam comments are rejected before they are checked by Akismet, so they never make it into the database at all.

Now, a few months later, I’m posting a screenshot of the Akismet stats graph from the WordPress dashboard showing the number of spam comments identified by Akismet before and after the system was implemented.

Akismet stats for August - December, 2010

The spike in spam comments detected around November 3rd occurred after an update to WordPress overwrote my altered wp-comments.php file. I replaced the file and the spam dropped back down to single digits per day.

I’ve written a number of posts on ways to reduce the number of spam comments a blog receives. In this post, I’ll revisit an old method that has almost completely stopped spam comments at ardamis.com before they get to the database.

My first system for blocking WordPress comment spam was an overly complex combination of JavaScript and a challenge-response to test that the comment was being submitted by a person. The value of the action attribute in the form was not in the HTML when the page was loaded, so the form couldn’t be immediately submitted, then JavaScript was used to write the path to a renamed wp-comments-post.php file only after a certain user action was performed. I was never really satisfied with it. I didn’t like relying on JavaScript, I had doubts that any human being (meaning of any mental or physical capacity, speaking any language, etc.) could correctly answer the question, and I was concerned that any obstacle to submitting a form discourages legitimate commenting.

A few months later, I posted a simpler timestamp method for reducing WordPress comment spam that compares two timestamps and then rejects any form submission that occurrs within 60 seconds of the post page being loaded. The visitor wasn’t bothered by an additional form field solely for anti-spam and there was no JavaScript involved.

Both methods were very effective at blocking spam before it made it to the database. In the five months leading up to the implementation of the first method, Akismet was catching an average of 1418 spam comments per month. In the first five months after these methods were put in place, Akismet was catching only 54 spam comments per month. But I also noticed a reduction in legitimate comments, from an average of 26 per month to 20 per month, which led me to suspect that real visitors attempting to leave comments were being discouraged from doing so.

The timestamp method required changing a core file, which was overwritten each time WordPress was updated. As time went on, I forgot to replace the file after upgrading WordPress, so the protection was lost and I once again had only Akismet blocking spam. A few months later, while doing work on the database in an attempt to speed up WordPress, I happened to check my historical stats and found that Akismet had detected 4,144 comments in July, 2010. Yikes. It was time to revisit these old methods.

At 2:30 AM on August 1, 2010, I again implemented my timestamp method, but this time I also renamed the wp-comments-post.php file that processes the form. I changed my theme’s comments.php file to submit the form to the new page, deleted the wp-comments-post.php file from the server and tested to make sure that comments could still be submitted. And then I waited to see what would happen.

The effect was pretty amazing. The spam had almost completely stopped.

My Akismet stats look like this:

Date Spam
7.30.10 192
7.31.10 196
8.1.10 32
8.2.10 0
8.5.10 4
8.8.10 4
8.10.10 4
8.11.10 4
8.13.10 0
8.14.10 0

(I don’t know why so many dates in August are skipped in the log, but whatever.)

Fast, but only partial protection

The quick and easy way to reduce the number of spam comments that your WordPress blog receives is to merely change the location of the comment form processing script.

  1. Rename wp-comments-post.php to anything else. I like using a string of random hexadecimal characters, like: z1t0zVGuaCZEi.php.
  2. Edit your current theme’s comments.php so that the form is submitted to this new file.
  3. Upload these files to their respective directories, then delete the wp-comments-post.php file from your server.

This method works well to stop spam submitted by bots that assume the comment form processing script used by WordPress is always at the same location. More advanced bots will read the actual location of the file from the action attribute of the form element, but that can be countered by using either the JavaScript or timestamp method.

Access log analysis

To illustrate the effectiveness of the renamed wp-comments-post file + timestamp check, below are some events from my 06 August 2010 access log.

Bot defeated by renamed file alone

Here is a form submission to the non-existent wp-comments-post file that occurs 2 seconds after the post page is requested.

173.242.112.44 - - [06/Aug/2010:23:21:37 -0700] "GET www.ardamis.com/2007/07/12/defeating-contact-form-spam/ HTTP/1.0" 200 32530 "http://www.google.com" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50"
173.242.112.44 - - [06/Aug/2010:23:21:39 -0700] "POST www.ardamis.com/wp-comments-post.php HTTP/1.0" 404 15529 "http://www.ardamis.com/2007/07/12/defeating-contact-form-spam/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50"

The bot is sent a 404 HTTP status code, which is widely understood to mean that the page isn’t there and you can stop asking for it. But that doesn’t stop this bot! Two minutes later, it’s back at another page, trying again.

173.242.112.44 - - [06/Aug/2010:23:23:01 -0700] "GET www.ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/ HTTP/1.0" 200 101259 "http://www.google.com" "Opera/9.64(Windows NT 5.1; U; en) Presto/2.1.1"
173.242.112.44 - - [06/Aug/2010:23:23:05 -0700] "POST www.ardamis.com/wp-comments-post.php HTTP/1.0" 404 15529 "http://www.ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/" "Opera/9.64(Windows NT 5.1; U; en) Presto/2.1.1"

Again, it gets a 404 back. Some bots never learn.

Bot defeated by timestamp check

Here is a form submission to the renamed wp-comments-post file that occurs 4 seconds after the post page is requested.

91.201.66.6 - - [06/Aug/2010:23:30:41 -0700] "GET www.ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/ HTTP/1.1" 200 21787 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)"
91.201.66.6 - - [06/Aug/2010:23:30:45 -0700] "POST www.ardamis.com/wp-comments-post-timestamp-3.0.1.php HTTP/1.1" 500 1227 "http://www.ardamis.com/2007/03/29/xbox-360-gamercard-wordpress-plugin/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)"

The 500 HTTP header indicates that this submission was denied and the comment never made it to the database. This access log doesn’t indicate which check stopped the POST (eg: the email validation or the timestamp function), but my money is on the timestamp.

Here’s another form submission to the renamed wp-comments-post file that occurs one second after the post page is requested. Speed reader or bot?

95.220.185.210 - - [06/Aug/2010:23:56:54 -0700] "GET www.ardamis.com/2010/02/26/fixing-word-2007-add-in-issues/ HTTP/1.1" 200 23977 "-" "Opera/9.01 (Windows NT 5.0; U; en)"
95.220.185.210 - - [06/Aug/2010:23:56:55 -0700] "POST www.ardamis.com/wp-comments-post-timestamp-3.0.1.php HTTP/1.1" 500 1213 "http://www.ardamis.com/2010/02/26/fixing-word-2007-add-in-issues/" "Opera/9.01 (Windows NT 5.0; U; en)"

The submission is rejected.

Taking the method even further

To take this method even further, one could send a 200 OK header even when the comment is blocked, so the bots never know their mission failed. But this seems unnecessary at this point, as it doesn’t appear that they change their behavior after being sent a 404 error, or that they try again after being sent a 500 error. It also makes it harder to figure out from the access logs which comments were rejected and for what reason.

If you still want to do this, first implement the timestamp method, then make the following modifications.

Sending a 200 header

$comment_timestamp    = trim($_POST['timestamp']);
$submitted_timestamp  = time();

if ( $comment_timestamp == '' ) {
// If the value for $_POST['timestamp'] is an empty string, exit (the form wasn't submitted by the theme's comments.php)
	header('HTTP/1.1 200 OK');
	echo '<p style="text-align:center;">Error: It looks like this form was not submitted by the form at ' . get_option('siteurl') . '.</p>';
	exit;
}
if ( $submitted_timestamp - $comment_timestamp < 60 ) {
// If the form was submitted within 60 seconds of page load, exit
	header('HTTP/1.1 200 OK');
	echo '<p style="text-align:center;">Error: The comment was posted too soon after the page was loaded.  Please press the Back button on your browser and try again in a few seconds.</p>'; 
	exit;
}
// If the form was submitted more than 10 minutes after page load, die
if ( $submitted_timestamp - $comment_timestamp > 600 ) {
	header('HTTP/1.1 200 OK');
	echo '<p style="text-align:center;">Error: You waited too long before posting a comment.</p>';
	exit;
}

One could also write a record to a database each time the old wp-comments-post.php file is requested or any of the timestamp checks block a form submission, and pretty quickly generate a list of IP addresses for a black list. At the same time, one could log which timestamp check caught the spam attempt, which is interesting enough that I’ll probably do it eventually.

My clients and I have been receiving increasing amounts of spam sent through our own contact forms. Not being a spammer myself, I’m left to speculate on how one sends spam through a webmail form, but I’ve come up with two ways of preventing it from happening. Both of these methods involve editing the contact form’s HTML and adding a JavaScript file. They also require that legitimate users of the contact form have DOM-compliant browsers with JavaScript enabled.

Defeating human-like robots

For a very long time, I suspected that the spammers’ bots were filling out and submitting forms just like regular human visitors. They would look for input fields with labels like ‘name’ and ‘email’, and, of course, for textarea elements. The bots would enter values into the fields and hit the submit button and move on to the next form.

To combat this, one could institute a challenge-response test in the form of a question that must be correctly answered before the form is submitted. Eric Meyer wrote a very inspiring piece at WP-Gatekeeper on the use of easily human-comprehensible challenge questions like “What is Eric’s first name?” as a way to defeat spambots. There are a number of accessibility concerns and limitations with this method, mostly with respect to choosing a challenge question that any human being (of any mental or physical capacity, speaking any language, etc.) could answer, but that a robot would be unable to recognize as a challenge question or be unable to correctly answer. However, these issues also exist with the CAPTCHA method.

In this case, the challenge question will be What color is an orange? If answered correctly, the form is submitted. If answered incorrectly, the user is prompted to try again.

Here’s how to implement a challenge question method of form validation:

First, create a JavaScript file named ‘validate.js’ with the following lines:

function validateForm()
{
    valid = true;

    if ( document.getElementById('verify').value != "orange" )
    {
        alert ( "You must answer the 'orange' question to submit this form." );
		document.getElementById('verify').value = "";
		document.getElementById('verify').focus();
		valid = false;
    }

    return valid;
}

This script gets the value of the input field with an ID of ‘verify’ and if the value is not the word ‘orange’, the script returns ‘false’ and doesn’t allow the form to post. Instead, it pops up a helpful alert, erases the contents of the ‘verify’ field, and sets the cursor at the beginning of the field.

Add the JavaScript to your HTML with something like:

<head>
...
<script src="validate.js" type="text/javascript"></script>
...
</head>

Next, modify the form to call the function with an onSubmit event. This event will be triggered when the form’s Submit button is activated. Add an input field with the ID ‘verify’ and an onChange event to convert the value to lowercase. Add the actual challenge question as a label.

<form id="contactform" action="../webmail.php" method="post" onsubmit="return validateForm();">
...
	<div><input type="text" name="verify" id="verify" value="" size="22" tabindex="1" onchange="javascript:this.value=this.value.toLowerCase();" /></div>
	<label for="verify">What color is an orange?</label>
...
</form>

A visitor to the site who fills out the form but does not correctly answer the challenge question will not be able to submit the form.

Defeating non-human-like robots

I believe that the challenge-response method is becoming less effective, however. According the article ‘Spamming You Through Your Own Forms‘ by Will Bontrager, the spammers’ bots are not using the form as it is intended.

This is what appears to be happening: Spammers’ robots are crawling the web looking for forms. When the robot finds a form:

  1. It makes a note of the form field names and types.
  2. It makes a note of the form action= URL, converting it into an absolute URL if needed.
  3. It then sends the information home where a database is updated.

Dedicated software uses the database information to insert the spammer’s spew into your form and automatically submit it to you.

His response is to stop the process at step 2 by eliminating the bots’ access to the webmail script. He suggests doing this by hiding the URL of the webmail script in an external JavaScript file, then using JavaScript to delay the writing of the form’s action attribute for a moment. The robots parsing just the page’s HTML never locate the URL to the webmail script, so it is never available for the spammers to exploit.

While I like the idea, I think I’ve come up with a better way of implementing it.

First, rename the webmail script, because the spammers already know the name and location of that script. For example, if GoDaddy is your host, contact forms on your site may be handled by ‘gdform.php’, located in the server root. You’ll need to rename that to something else. For purposes of illustration, I’ll rename the script ‘safemail.php’, but a string of random hexadecimal characters would be even better.

Next, give your contact form an ID. If you are running WordPress or other blogging software, be sure to give the contact form a different ID than the comment form, or else the JavaScript will cause the comment form to post to the webmail script. I’ll give my contact form the ID ‘contactform’.

<form id="contactform" action="../gdform.php" method="post">

We want to prevent the spammers from learning about the newly renamed script. This is done by giving the URL to a fake webmail script as the form’s action attribute and using JavaScript to change the action attribute of the form to the real webmail script only after some user interaction has occurred. I’ll use ‘no-javascript.php’ as my fake script.

To accommodate visitors who aren’t using JavaScript, the fake script could instead be a page explaining that JavaScript is required to submit the contact form and offering an alternate way to contact the author.

Edit the contact form’s action attribute to point to the fake script.

<form id="contactform" action="no-javascript.php" method="post">

Create a new, external JavaScript file called ‘protect.js’, with the following lines:

function formProtect() {
	document.getElementById("contactform").setAttribute("action","safemail.php");
}

The function formProtect, when called, finds the HTML element with ID ‘contactform’ and changes its ‘action’ attribute to ‘safemail.php’. Obviously, one could make this script more complex and potentially more difficult for spammers to parse through the use of variables, but I don’t see that as necessary at this point.

Add the JavaScript to your HTML with something like:

<head>
...
<script src="formprotect.js" type="text/javascript"></script>
...
</head>

Finally, call the script at some point during the process of filling out the form. Exactly how you want to do this is up to you, and it’ll be effective longer if you don’t share how you do it. Perhaps the most straight-forward way would be to call the script at the point of submission by adding onsubmit="return formProtect();" to the <form> element.

<form id="contactform" action="no-javascript.php" method="post" onsubmit="return formProtect();">

If you want to use both the challenge question and the action rewriting functions, you may want to combine them into a single file or trigger formProtect separately with an event on one of the required input fields. If you decide to trigger formProtect with an event other than onsubmit, consider usability/accessibility issues—not everyone uses a mouse.

In conclusion

By implementing both of these methods, it is possible to dramatically reduce or even completely stop contact form spam. In the two months since I implemented this system, I haven’t received a single spam email from any of my contact forms.

The challenge-response test should deter or at least hinder human spammers and robots that fill out forms as though they were human. The trade-off is some added work for legitimate users of the form.

The action attribute rewriting method should immediately eliminate all spam sent directly to your form by spammers who have the URL of your webmail script in their databases. It should also prevent the rediscovery of the URL. Visitors with JavaScript enabled won’t be aware of the anti-spam measures.

For WordPress users

Defeating WordPress comment spam explains how to apply the attribute rewriting method to your WordPress site.