Monthly Archives: September 2007

Comment spam comes from humans who are paid to post it and robots/scripts that do it automatically. The majority of spam comes from the bots. There’s very little one can do to defend against a determined human being, but bots tend to behave predictably, and that allows us to develop countermeasures.

From my observations, it seems that the spambots are first given a keyword phrase to hunt down. They go through the search engine results for pages with that keyword phrase, and follow the link to each page. If the page happens to be a WordPress post, they pass their spammy content to the comment form. Apparently, they do this in a few different ways. Some bots seem to inject their content directly into the form processing agent, a file in the blog root named wp-comments-post.php, using the WordPress default form field IDs. Other bots seem to fill in any fields they come across before submitting the form. Still others seem to fill out the form, but ignore any unexpected text input fields. All of these behaviors can be used against the spammers.

One anti-spam technique that has been used for years is to rename the script that handles the form processing. If you look at the HTML of a WordPress post with comments enabled, you’ll see a line that reads:

<form action="http://yourdomain.com/wp-comments-post.php" method="post" id="commentform">

The ‘wp-comments-post.php’ file handles the comment form processing, and a good amount of spam can be avoided by simply renaming it. Many bots will try to pass their content to directly to that script, even if it no longer exists, to no effect.

More sophisticated bots will look through the HTML for the URL of the form processing script, and in doing so will learn the URL of the newly renamed script and pass their contents to that script instead. The trick is to prevent these smarter bots from discovering the URL of the new script. Because it seems that the bots aren’t looking through external JavaScript files yet, that’s where we will hide the URL. (If you do use this technique, it would be very considerate to tell your visitors that JavaScript is required to post comments.)

Step 1

Rename the wp-comments-post.php file to anything else. Using a string of random hexadecimal characters would be ideal. Once you’ve renamed the file, enter the address of the file in your browser. The page should be blank; if you get a 404 error, something is wrong. Make a note of that address, because you’ll need it later. Verify that the wp-comments-post.php file is empty or is no longer on your server.

(Because I was curious about how many bots were hitting the wp-comments-post.php file directly, I replaced the code with a hitcounter. Sure enough, bots are still hitting the file directly, even though there is no longer any path leading to it.)

Step 2

Open up the ‘comments.php’ file in your theme directory. Find the line:

<form action="http://yourdomain.com/wp-comments-post.php" method="post" id="commentform">

and change the value of the action attribute to a number sign (or pound sign, or hash), like so:

<form action="#" method="post" id="commentform">

Any bots that come to the page and search for the path to your comment processing script will just see the hash, so they will never discover the URL to the real script. This change also means that if a bot or a visitor tries to submit the form, the form will fail, because a WordPress single post page isn’t designed to process forms. We want the bots to fail, but we’ll need to put things right for humans.

If you are tempted to designate a separate page for the action value, note that the only people likely to ever see this page are visitors without JavaScript enabled who fill out the form.

Step 3

Create a new JavaScript file with the following code.

function commentScriptReveal() {
	// enter the URL of your renamed wp-comments-post.php file below
	var scriptPath = "http://yourdomain.com/renamed-wp-comments-post.php";
	document.getElementById("commentform").setAttribute("action", scriptPath);
}

Enter the address of your renamed file as the value for the variable scriptPath. The function commentScriptReveal, when called, will find the element with the ID ‘commentform’ (that’s the comment form) and change its action attribute to the URL of the renamed file, allowing the form to be successfully sent to the processing agent.

Save the file as ‘commentrevealer.js’ and upload it to the /scripts/ directory in your blog’s root. Add the script to your theme’s header.php file:

<script src="<?php echo bloginfo('url'); ?>/scripts/commentrevealer.js" type="text/javascript"></script>

Now we just need to decide how to call the commentScriptReveal function.

Step 4

The ideal method of calling the function would be the one where the human visitor always calls the function, and the bot never calls it. To do this, we need to know something about how the bots work.

Step 4a — For spam bots that ignore unexpected text input fields:

If the bots ignore unexpected text input fields, we can simply add a field, label it ‘required’, and attach the script revealer to that field with one of the following event handlers:

  • onchange triggered when the user changes the content of a field
  • onkeypress triggered when a keyboard key is pressed or held down
  • onkeydown triggered when a keyboard key is pressed
  • onkeyup triggered when a keyboard key is released
  • onfocus triggered when an element gets focus
  • onblur triggered when an element loses focus

I’m intentionally vague about how to trigger the function commentScriptReveal because this technique will be efficacious longer if different people use different events. Furthermore, the text input field doesn’t necessarily need to do anything, its contents will just be discarded when the form is processed. In fact, it doesn’t even need to be a text input field. It can be any form control—a button, a checkbox, a radio button, a menu, etc. We just need human visitors to interact with it somehow. Those bots that skip over the control won’t trigger the revealer event, and your visitors (who always follow directions) will.

If everyone goes about implementing this method in a slightly different way, the spammers should find it much more difficult to counter.

For further reading on JavaScript events: QuirksMode – Javascript – Introduction to Events.

Step 4b — For spam bots that add text to every input field they come across:

If the bots are hitting every text input field with some text, follow Step 4a, and then create a second JavaScript file, named ‘commentconcealer.js’, with the following code:

function commentScriptConceal() {
	// enter the URL of your renamed wp-comments-post.php file below
	var scriptPath = "#";
	document.getElementById("commentform").setAttribute("action", scriptPath);
}

The function commentScriptConceal re-rewrites the action attribute back to “#“.

Upload the file and add the script to your theme’s header.php file:

<script src="<?php echo bloginfo('url'); ?>/scripts/commentconcealer.js" type="text/javascript"></script>

Add another text input field somewhere below the one you added in Step 4a. Hide this field from visitors with {display: none;}. Call the function with an onfocus (or onblur, etc.) event on the second input field:

<p style="display: none;"><input type="text" name="reconceal" id="reconceal" value="" size="22" onfocus="return commentScriptConceal()" />
<label for="reconceal"><small>OMG Don't Touch This Field!?!?</small></label></p>

The legitimate visitors will never trigger this, but any bot that interacts with every field on a page will.

Step 5

If you chose to use a text input field in Step 4a, consider making that a challenge-response test. It’s always a good idea to use a server-side check to back up a client-side check. I like to ask the question “What color is an orange?” as a tip of the hat to Eric Meyer. This challenge-response test can be integrated into wp-comments-post.php, so that if the user fails the test, the form submission dies in the same way it would if a required field were left blank.

For this example, let’s ask the question “What color is Kermit the Frog?” The answer, of course, is green.

Open your wp-comments-post.php file, and (in WP version 2.2.2) somewhere around line 31 add:

$comment_verify = strtolower(trim($_POST['verify']));

This will trim any whitespace from the answer to the challenge question, and convert the characters to lowercase.

Around line 45, find the lines:

} else {
	if ( get_option('comment_registration') )
		wp_die( __('Sorry, you must be logged in to post a comment.') );
}

And add the following lines to them, as so:

} else {
	if ( get_option('comment_registration') )
		wp_die( __('Sorry, you must be logged in to post a comment.') );
	if ( $comment_verify != 'green' )
		wp_die( __('Sorry, you must correctly answer the "Kermit" question to post a comment.') );
}

This will check the visitor’s answer against the correct answer, and if they don’t match, the script will stop and the comment won’t be submitted. The visitor can hit the back button to change the answer.

Now we need to add the challenge question to the theme file comments.php. Find the lines:

<p><input type="text" name="url" id="url" value="<?php echo $comment_author_url; ?>" size="22" tabindex="3" />
<label for="url"><small>Website</small></label></p>

And right below them, add the following lines:

<p><input type="text" name="verify" id="verify" value="" size="22" tabindex="4" />
<label for="verify"><small>What color is Kermit the Frog? (anti-spam) (required)</small></label></p>

(You may need to update all the tabindex numbers after you add this field.)

That’s it, your visitors will now have to correctly answer the Kermit the Frog question to submit the form.

Further customization

The methods described here just scratch the surface of what can be done to obfuscate the comment handling script. For example, the URL could be broken up into parts and saved as multiple variables. It could have a name comprised of numbers, and the commentScriptReveal JavaScript could perform math to assemble it.

I can’t imagine I’m the first person to come up with this, but… the user could be required to complete a challenge-response question, the correct answer to which would be used as part of the name of the script—the URL to the script doesn’t exist anywhere until the visitor creates it.

But, there are some people who don’t want to impose even a minor inconvenience on their visitors. If you don’t like challenge-response tests, what about using JavaScript to invisibly check the screen resolution of the user agent? And we haven’t even considered using cookies yet.

Credits

Many thanks to Jeff Barr for demonstrating how to put a challenge-response test into the wp-comments-post.php file in his post WordPress Comment Verification (With Source Code). I had been using only JavaScript for validation up until that point, shame on me.

Many thanks also to Will Bontrager for writing a provocative explanation of how to temporarily hide the form processing script using JavaScript in Spamming You Through Your Own Forms.

Huge thanks to Eric Meyer, who wrote a pre-plugin, challenge-response test script called WP-Gatekeeper for a very early version of WordPress.

I’ve described two similar methods for defeating contact form spam by hiding the webmail script in an eariler post.

This plugin is once again actively supported. Please download the latest version from http://wordpress.org/extend/plugins/sociable/.

I’ve fixed some errors that I was experiencing with version 2.0 (dated 2007-02-02) of the Sociable WordPress plugin by Peter Harkins. Specifically, when running WordPress 2.2+, I would get the following warnings when saving changes:

Warning: implode() [function.implode]: Bad arguments. in PATH\wp-content\plugins\sociable\sociable.php on line 651

Warning: Invalid argument supplied for foreach() in PATH\wp-content\plugins\sociable\sociable.php on line 762

For each button, I’d get another error:

Warning: in_array() [function.in-array]: Wrong datatype for second argument in PATH\wp-content\plugins\sociable\sociable.php on line 797

I think I’ve narrowed the cause down to the way the plugin was writing the newly selected options to the database.

In addition to fixing those warnings, I also corrected a few behaviors:

  1. The StumbleUpon button didn’t send the URL to the correct address at http://www.stumbleupon.com/. The button now works correctly, and also sends the page’s title.
  2. In Options -> Sociable, leaving the field for “text displayed in front of the icons” blank now results in no extraneous code being inserted into the page. Leaving the field blank also eliminates the popup “These icons link to social bookmarking sites…” text.
  3. The links to the social networking sites are now all rel="nofollow".