Category Archives: Web Site Dev

Posts concerning web site design and development. Examples of php, xhtml, and javascript code. Wordpress-related posts may be cross-categorized here, but also have their own category.

Resolving transaction concurrency issues in a PHP+MySQL multi-user environment

I’ve been developing a PHP/MySQL web application that will be accessed by multiple users. These users will be both viewing and editing records in the database. Obviously, any situation in which multiple users may be performing operations on the same record puts the integrity of the data at risk.

In the case of my application, there is a very real possibility (a certainty, actually) that two or more people will open the same record at the same time, make changes, and attempt to save these changes. This common concurrent execution problem is known as the “lost update”.

User A opens record “A”.
User B opens record “A”.
User A saves changes to record “A”.
User B saves changes to record “A”.

In this example, User A’s changes are lost – replaced by User B’s changes.

I started to look for a method of preventing this sort of data loss. At first, I wanted to lock the record using a sort of check-in/check-out system. User A would check-out record “A”, and then have exclusive write access to that record until it was checked back in as part of the saving process. There were a number of problems with this method, foremost that User A may decide to not make any changes and so not save the record, which would leave the record in a checked-out state until further administrative action was taken to unlock it.

For awhile, I tried to come up with some ingenious way around this, which usually boiled down to somehow automatically unlocking the record after a period of time. But this is not a satisfactory solution. For one thing, a user may have a legitimate reason to keep the record checked-out for longer periods. For another, User B shouldn’t have to wait for a time-out event to occur if User A is no longer in the record.

So what I eventually came up with is a method of checking whether a record has been changed since it was accessed by the user each time a save is initiated. My particular way of doing this involves comparing timestamps, but other ways exist.

Here’s how I’m implementing my solution to the lost update concurrency issue:

User A creates record “A” and saves it at 9:00 AM. A “last-saved timestamp” of 9:00 AM is generated and saved to the record.

User A opens record “A” at 10:00 AM. An “opened timestamp” of 10:00 AM is generated and written to a hidden (or readonly) input field on the HTML page.
User B opens record “A” at 10:00 AM. An “opened timestamp” of 10:00 AM is generated and written to a hidden (or readonly) input field on the HTML page.

At 10:30 AM, User A attempts to save the record. The “last-saved timestamp” is retrieved from the record. The “opened timestamp” of 10:00 AM is compared to the “last-saved timestamp” of 9:00 AM. Because the record has not been changed since it was opened, the record is saved. A new “last-saved timestamp” of 10:30 AM is generated and saved to the record.

At 11:00 AM, User B attempts to save the record. The “last-saved timestamp” is retrieved from the record. The “opened timestamp” of 10:00 AM is compared to the “last-saved timestamp” of 10:30 AM (User A’s timestamp). Because the record has been changed since it was opened by User B, User B is not allowed to save the record.

User B will have to re-open record “A”, consider the effect that User A’s changes may have, and then make any desired changes.

Unless I’m missing something, this assures that the data from the earlier save will not be overwritten by the later save. To keep things consistent, I’m using PHP to generate all of my timestamps from the server clock, as JavaScript time is based on the user’s system time and is therefore wholly unreliable.

The main drawback, that I see, is extra work for User B, who has to now review the record as saved by User A before deciding what changes to make. But this is going to be necessary anyway, as changes made between when User B opened and attempted to save the record may influence User B’s update.

The strange thing is that I haven’t seen this offered as a solution on any of the pages I found while Googling for solutions to the access control, lost update and other concurrency-related data loss problems. Lots of people acknowledge the problem and potential for data loss, but few offer solutions on the application level – preferring to rely on a database engine’s ability to lock rows.

How to Geolocate Visitors Using an IP-to-Country Database

In this post, I’ll illustrate how to use the IP-to-Country database available from http://ip-to-country.webhosting.info/ to identify the real-world geographic location of visitors to a web page (geolocate) based on their IP addresses. Once you know where a visitor is physically located, you can do all sorts of nifty things, such as send them location-aware content (think language, currency, etc.).

The IP-to-Country Database

There are a number of databases that associate IP address ranges with countries. I’ll be using the one at http://ip-to-country.webhosting.info/. It is available as a CSV file, so the first step will be getting the contents migrated into a MySQL database.

The MySQL Part

First, create a MySQL database. I like phpMyAdmin, but use whatever method you are comfortable with. Name the database ip2country.

Once the database has been created, we need to create a table. In phpMyAdmin, click on the SQL tab and enter the following lines:

CREATE TABLE `ip2country` (
  `ipFrom` int(15) NOT NULL default '0',
  `ipTo` int(15) NOT NULL default '0',
  `country2` char(2) NOT NULL default '',
  `country3` char(3) NOT NULL default '',
  `country` varchar(25) NOT NULL default ''
);

This creates a table, also called ip2country, and five fields (ipFrom, ipTo, country2, country3, and country) to hold the data from the CSV.

The next step is to get the contents of the CSV file into the MySQL database, and there are two ways to do this.

If you are using MySQL version 5.0 or greater, the fastest way is to use the LOAD DATA INFILE statement. The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. (For details, visit http://dev.mysql.com/doc/refman/5.0/en/load-data.html.) On my computer, it takes less than half a second to get all 79,000+ rows into the MySQL database. To use this method, click on the SQL tab and enter the following lines, changing the path to “ip-to-country.csv” to the actual path to the file on your local computer.

LOAD DATA INFILE 'D:/PATH/TO/FILE/ip-to-country.csv' INTO TABLE ip2country
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY """"
LINES TERMINATED BY "\n";

The other way to populate the MySQL database is by running a PHP script that writes the records from the CSV one line at a time. To use this method, upload the CSV to a directory on your web site, then create a PHP file with the following lines, editing the script to use your database address, username, and password:

<?php

	// Adjust for the database address, username, password 
	$link = mysql_connect('localhost', 'root', '');
	if (!$link) {
		die('Could not connect: ' . mysql_error());
	}

	$db = mysql_select_db("ip2country") or die(mysql_error());

	// Set the variable $row to zero to begin counting entries
	$row = 0;

	// The ip-to-country.csv must be in the same directory as this php file
	$handle = fopen("ip-to-country.csv", "r");

	// Required to prevent timeout
	set_time_limit(300);

	// While rows exist, write each into the database
	while ($data = fgetcsv($handle, 1000, ",")) {
		$query = "INSERT INTO ip2country(`ipFrom`, `ipTo`, `country2`, `country3`, `country`) VALUES('".$data[0]."', '".$data[1]."', '".$data[2]."', '".$data[3]."', '". addslashes($data[4]) ."')";
		$result = mysql_query($query) or die("Invalid query: " . mysql_error().__LINE__.__FILE__);
		$row++;
	}

	// Close the database connection
	fclose ($handle);

	// Print a confirmation
	echo "All done! " . $row . " rows added to database.";
	
?>

Upload the PHP script to the same directory as the CSV file and then browse to it. It will migrate the contents of the CSV file into the MySQL database and then give a little confirmation of how many rows were added when it is done.

If your script times out, you may need to either increase the value of set_time_limit(300); or move that part of the script inside the while loop.

The PHP Part

Now that the database is in place, it’s time to do something with it. The following PHP script will get the visitor’s IP address and print out the corresponding country.

<?php

	// Figure out the visitor's IP address
	$ip = $_SERVER['REMOTE_ADDR'];

	// Establish a database connection (adjust address, username, and password)
	$dbh = mysql_connect("localhost", "root", "") or die("Could not connect: " . mysql_error());
	mysql_select_db("ip2country");

	// Create a query string
	$country_query = "SELECT country2, country FROM ip2country WHERE ipFrom<=INET_ATON('" . $ip . "') AND ipTo>=INET_ATON('" . $ip . "')";
		  
	// Execute the query
	$country_exec = mysql_query($country_query);

	// Fetch the record set into an array
	$ccode_array = mysql_fetch_array($country_exec);

	// Close the database connection
	mysql_close($dbh);

	// Get the country code from the array and save it as a variable
	$country_code = $ccode_array['country2'];

	// Get the country name from the array and save it as a variable
	$country_name = $ccode_array['country'];
	
	// If the database contains a match, print out the country name and country code, otherwise print the IP address
	if ($country_code != "") { 
	echo '<p>The IP-to-Country database contains a match for your ip address: ' . $ip . '</p>';
	echo '<p>You are located in ' . $country_name . ', and the country code is ' . $country_code . '</p>';
	}else{
	echo '<p>Sorry. The IP-to-Country database does not contain a match for your ip address: ' . $ip . '</p>';
	}

?>

Summary

That’s it. You now have a way to determine each visitor’s physical location. Use geolocation carefully, and always provide a fall-back in the event the database does not contain a given IP. IP addresses are constantly being assigned and revoked, so keeping your database up-to-date is critical.

Using timestamps to reduce WordPress comment spam

Update 8.27.11: The method described in this post uses PHP to generate the timestamps. If your site is using a caching plugin, the timestamps in the HTML will be stale, and this method will not work. Please see my updated post at A cache-friendly method for reducing WordPress comment spam for a new method using JavaScript for sites that use page caching.

In this post, I’ll explain how to reduce the amount of comment spam your WordPress blog receives by using an unobtrusive ‘handshake’ between the two files necessary for a valid comment submission to take place. I’ve written a few different articles on reducing comment spam by means of a challenge response test that the visitor must complete before submitting a comment, but I’m now looking for ways to achieve the same results while keeping the anti-spam method invisible to the visitor.

I’m a big fan of Akismet, but I also want to block as much spam as possible before it is caught by Akisment in order to reduce the number of database entries.

One thing this method does not do is rename and hide the path to the form processing script, but it makes that technique obsolete, anyway.

In the timestamp handshake method, a first timestamp is generated and written as a hidden input field when the post page loads. When the comment is submitted, a second timestamp is generated by the comment-processing script and it and the page-load timestamp are saved as variables. If the page-load timestamp variable is blank, which should be the case if the spambot uses any other page to populate the comment, the script will die. The page-load timestamp is then subtracted from the comment-submission timestamp. If the comment was submitted less than 60 seconds after the post page was loaded, the script dies with a descriptive error message. Hopefully, this will separate the bots’ comments from those left by thoughtful human visitors who have taken the time to read your post. If a human visitor does happen to submit a comment within 60 seconds of the page loading, he or she can click his or her browser’s back button and try resubmitting the comment again in a few seconds.

One drawback is that this method does involve editing a core file – wp-comments-post.php. You’ll have to re-edit it each time you upgrade WordPress, which is a nuisance, I know. The good thing is that if you forget to do this, people can still comment – you just won’t have the anti-spam protection.

Note that the instructions in the following steps are based on the code in WordPress version 2.3 and the Kubrick theme included with that release. You may need to adjust for your version of WordPress.

Step 1 – Add the hidden timestamp input field to the comment form

Open the comments.php file in your current theme’s folder and find the following lines:

<p><textarea name="comment" id="comment" cols="100%" rows="10" tabindex="4"></textarea></p>

<p><input name="submit" type="submit" id="submit" tabindex="5" value="Submit Comment" />

Add the following line between them:

<p><input type="hidden" name="timestamp" id="timestamp" value="<?php echo time(); ?>" size="22" /></p>

Step 2 – Modify the wp-comments-post.php file to create the second timestamp and perform the comparison

Open wp-comments-post.php and find the lines:

$comment_author       = trim(strip_tags($_POST['author']));
$comment_author_email = trim($_POST['email']);
$comment_author_url   = trim($_POST['url']);
$comment_content      = trim($_POST['comment']);

Immediately after them, add the following lines:

$comment_timestamp    = trim($_POST['timestamp']);
$submitted_timestamp  = time();

if ( $comment_timestamp == '' )
	wp_die( __('Hello, spam bot!') );
	
if ( $submitted_timestamp - $comment_timestamp < 60 )
	wp_die( __('Error: you must wait at least 1 minute before posting a comment.') );

That’s it; you’re done.

Credits

Thanks to Jonathan Bailey for suggesting the handshake in his post at http://www.plagiarismtoday.com/2007/07/24/wordpress-and-comment-spam/.

A collection of PHP code snippets

This is a collection of php code snippets that seem to come in handy rather often. They are assembled here more for my own organization than anything else.

String: trim and convert to lowercase

A very straightforward but useful snippet. A string is first trimmed of any leading or trailing white space, and then converted to lowercase letters. Good for normalizing user input.

<?php
$string = "Orange";
$string = strtolower(trim($string));
echo $string;
?>

String: truncate and break at word

This will attempt to shorten a string to $length characters, but will then increase the string (if necessary) to break at the next whole word and then append an ellipses to the end of the string. Good for shortening readable text while keeping it looking pretty.

<?php 
function truncate($string, $length) {
	if (strlen($string) > $length) {
		$pos = strpos($string, " ", $length);
		return substr($string, 0, $pos) . "...";
	}
	
	return $string;
}

	echo truncate('the quick brown fox jumped over the lazy dog', 10);
?>

In the above example, the resultant output will be the quick brown…, because the 10th character is the space immediately before the ‘b’ in ‘brown’, which is counted as part of the word ‘brown’.

What season is it?

Note that this is only a very rough approximation of when a season begins and ends. This snippet would be good for rotating a seasonal background or something, but it’s not astronomically correct, and I wouldn’t use it as a calendar. Reckoning a season is rather complex.

<?php echo "It is day " . date('z') . " of the year. <br />"; ?>
<?php $theday = date('z');
	if($theday >= "79" && $theday <= "171") { 
	$season = "Spring";
	} elseif($theday >= "172" && $theday <= "264") { 
	$season = "Summer";
	} elseif($theday >= "265" && $theday <= "355") { 
	$season = "Autumn";
	} else { 
	$season = "Winter";
	}
	echo "It's " . $season . "!";
?>

Get the number of days since something happened

This function takes a date (formatted as a Unix timestamp) and calculates the number of days since that date. The floor() function shouldn’t really be necessary, but it’s a hold-over from a less accurate function that used only the hours elapsed. In that function, the results would vary depending on the time of day the function was called. In this method, the times are normalized to 12:00:00 AM.

function calc_days_ago($date){
	// The function accepts a date formatted as a Unix timestamp
	
	// First, normalize the current date down to the Unix time at 12:00:00 AM (to the second)
	$now = time() - ( (date('G')*(60*60)) + date('i')*60 + date('s') );
	// Second, normalize the given date down to the Unix time at 12:00:00 AM (to the second)
	$then = $date - ( (date('G', $date)*(60*60)) + date('i', $date)*60 + date('s', $date) );
	$diff = $now - $then;
	$days = floor($diff/(24*60*60));
	switch ($days) {
	case 0:
 		$days_ago = "today";
		break;
	case 1:
		$days_ago = $days . " day ago";
		break;
	default:
		$days_ago = $days . " days ago";
	}
	return $days_ago;
}

Get the hours and minutes remaining until something happens

This function takes a time (formatted as a Unix timestamp) and calculates the number of hours and minutes remaining until that time. If the time has already passed, the function returns “historical”. Example outputs would be “7 hours”, “6 hours and 34 minutes”, and “12 minutes”. It could probably be made even more accurate if you changed it to use 3 decimal places and then round to 2 decimal places, but this is good enough for my purposes.

function calc_time_left($date){
	// The function accepts a date formatted as a Unix timestamp

	$now = time();
	$event = $date;
	if ($event >= $now) {
		$diff = $event - $now;
		$unroundedhours = $diff/(60*60);
		// Find the hours, if any, and assemble a string
		$hours = floor($unroundedhours);
		if ($hours > "0") {
			$hourtext = ($hours == "1")? " hour" : " hours";
			$thehours = $hours . $hourtext;
		}else{
			$thehours = "";
		}
		// Find the minutes, if any, and assemble a string
		if (strpos($unroundedhours, '.')) {
			$pos = strpos($unroundedhours, '.') + 1;
			$remainder = substr($unroundedhours, $pos, 2);
			$minutes = floor($remainder * .6);
			$minutetext = ($minutes == "1")? " minute" : " minutes";
			$theminutes = $minutes . $minutetext;
		}elseif ($minutes == "0") {
			$theminutes = "";
		}else{
			$theminutes = "";
		}
		if ($thehours && $theminutes) {
			$sep = " and ";
		}
		$timeleft = $thehours . $sep . $theminutes;
	}else{
		$timeleft = "historical";
	}
	return $timeleft;
}

Get the path of the containing directory

This one really comes in handy. It will give you the URL of the folder where the executing script resides, so you can reference the full path to other files in that folder, no matter where the folder may be located. It works on both Linux and Windows servers, and it adds a trailing slash to the path if one doesn’t already exist, so that root looks the same as a subfolder.

<?php 
function get_path() {
	// Get the path of the folder where the executing script resides, with the trailing slash
	
	// Determine HTTPS or HTTP
	$url = (isset($_SERVER['HTTPS']) && $_SERVER['HTTPS'] == 'on') ? 'https://' : 'http://';
	$url .= $_SERVER['HTTP_HOST'] . dirname($_SERVER['PHP_SELF']);
	// Convert the trailing backslash (on Windows root) to a forward slash
	$url = str_replace('\\', '/', $url);
	// Determine whether the current location is root by looking for a trailing slash (Windows or Linux)
	if (strlen($url) != strrpos($url, '/') +1) {
		$url .= '/';
	}
	return $url;
}
?>

Centering unordered list items

I wrote this script because I wanted to center the thumbnails in the Plogger image gallery while still using an unordered list item to contain each thumbnail. The script figures out how many thumbnails exist on a page and how many will fit in the space provided, then adds sufficient left padding to each to give the appearance of them being centered. It can be easily adapted for other uses. The full explanation and code example is at //ardamis.com/2007/08/05/centering-the-thumbnails-in-plogger/.

Parse .html as .php (Apache .htaccess)

This isn’t actually a PHP script, but it’s still handy. If you need to write pages with a .html or .htm extension but still want to use PHP in those pages, adding the following line to your .htaccess file will force an Apache server to parse .html files as .php files. I have confirmed this to work with GoDaddy’s hosting (GoDaddy runs PHP as CGI).

AddHandler x-httpd-php .php .htm .html

If you are running an Apache server as part of an XAMPP installation on top of Windows, try using this instead:

AddType application/x-httpd-php .html .htm

Defeating WordPress comment spam

Comment spam comes from humans who are paid to post it and robots/scripts that do it automatically. The majority of spam comes from the bots. There’s very little one can do to defend against a determined human being, but bots tend to behave predictably, and that allows us to develop countermeasures.

From my observations, it seems that the spambots are first given a keyword phrase to hunt down. They go through the search engine results for pages with that keyword phrase, and follow the link to each page. If the page happens to be a WordPress post, they pass their spammy content to the comment form. Apparently, they do this in a few different ways. Some bots seem to inject their content directly into the form processing agent, a file in the blog root named wp-comments-post.php, using the WordPress default form field IDs. Other bots seem to fill in any fields they come across before submitting the form. Still others seem to fill out the form, but ignore any unexpected text input fields. All of these behaviors can be used against the spammers.

One anti-spam technique that has been used for years is to rename the script that handles the form processing. If you look at the HTML of a WordPress post with comments enabled, you’ll see a line that reads:

<form action="http://yourdomain.com/wp-comments-post.php" method="post" id="commentform">

The ‘wp-comments-post.php’ file handles the comment form processing, and a good amount of spam can be avoided by simply renaming it. Many bots will try to pass their content to directly to that script, even if it no longer exists, to no effect.

More sophisticated bots will look through the HTML for the URL of the form processing script, and in doing so will learn the URL of the newly renamed script and pass their contents to that script instead. The trick is to prevent these smarter bots from discovering the URL of the new script. Because it seems that the bots aren’t looking through external JavaScript files yet, that’s where we will hide the URL. (If you do use this technique, it would be very considerate to tell your visitors that JavaScript is required to post comments.)

Step 1

Rename the wp-comments-post.php file to anything else. Using a string of random hexadecimal characters would be ideal. Once you’ve renamed the file, enter the address of the file in your browser. The page should be blank; if you get a 404 error, something is wrong. Make a note of that address, because you’ll need it later. Verify that the wp-comments-post.php file is empty or is no longer on your server.

(Because I was curious about how many bots were hitting the wp-comments-post.php file directly, I replaced the code with a hitcounter. Sure enough, bots are still hitting the file directly, even though there is no longer any path leading to it.)

Step 2

Open up the ‘comments.php’ file in your theme directory. Find the line:

<form action="http://yourdomain.com/wp-comments-post.php" method="post" id="commentform">

and change the value of the action attribute to a number sign (or pound sign, or hash), like so:

<form action="#" method="post" id="commentform">

Any bots that come to the page and search for the path to your comment processing script will just see the hash, so they will never discover the URL to the real script. This change also means that if a bot or a visitor tries to submit the form, the form will fail, because a WordPress single post page isn’t designed to process forms. We want the bots to fail, but we’ll need to put things right for humans.

If you are tempted to designate a separate page for the action value, note that the only people likely to ever see this page are visitors without JavaScript enabled who fill out the form.

Step 3

Create a new JavaScript file with the following code.

function commentScriptReveal() {
	// enter the URL of your renamed wp-comments-post.php file below
	var scriptPath = "http://yourdomain.com/renamed-wp-comments-post.php";
	document.getElementById("commentform").setAttribute("action", scriptPath);
}

Enter the address of your renamed file as the value for the variable scriptPath. The function commentScriptReveal, when called, will find the element with the ID ‘commentform’ (that’s the comment form) and change its action attribute to the URL of the renamed file, allowing the form to be successfully sent to the processing agent.

Save the file as ‘commentrevealer.js’ and upload it to the /scripts/ directory in your blog’s root. Add the script to your theme’s header.php file:

<script src="<?php echo bloginfo('url'); ?>/scripts/commentrevealer.js" type="text/javascript"></script>

Now we just need to decide how to call the commentScriptReveal function.

Step 4

The ideal method of calling the function would be the one where the human visitor always calls the function, and the bot never calls it. To do this, we need to know something about how the bots work.

Step 4a — For spam bots that ignore unexpected text input fields:

If the bots ignore unexpected text input fields, we can simply add a field, label it ‘required’, and attach the script revealer to that field with one of the following event handlers:

onchange triggered when the user changes the content of a field
onkeypress triggered when a keyboard key is pressed or held down
onkeydown triggered when a keyboard key is pressed
onkeyup triggered when a keyboard key is released
onfocus triggered when an element gets focus
onblur triggered when an element loses focus

I’m intentionally vague about how to trigger the function commentScriptReveal because this technique will be efficacious longer if different people use different events. Furthermore, the text input field doesn’t necessarily need to do anything, its contents will just be discarded when the form is processed. In fact, it doesn’t even need to be a text input field. It can be any form control—a button, a checkbox, a radio button, a menu, etc. We just need human visitors to interact with it somehow. Those bots that skip over the control won’t trigger the revealer event, and your visitors (who always follow directions) will.

If everyone goes about implementing this method in a slightly different way, the spammers should find it much more difficult to counter.

For further reading on JavaScript events: QuirksMode – Javascript – Introduction to Events.

Step 4b — For spam bots that add text to every input field they come across:

If the bots are hitting every text input field with some text, follow Step 4a, and then create a second JavaScript file, named ‘commentconcealer.js’, with the following code:

function commentScriptConceal() {
	// enter the URL of your renamed wp-comments-post.php file below
	var scriptPath = "#";
	document.getElementById("commentform").setAttribute("action", scriptPath);
}

The function commentScriptConceal re-rewrites the action attribute back to “#“.

Upload the file and add the script to your theme’s header.php file:

<script src="<?php echo bloginfo('url'); ?>/scripts/commentconcealer.js" type="text/javascript"></script>

Add another text input field somewhere below the one you added in Step 4a. Hide this field from visitors with {display: none;}. Call the function with an onfocus (or onblur, etc.) event on the second input field:

<p style="display: none;"><input type="text" name="reconceal" id="reconceal" value="" size="22" onfocus="return commentScriptConceal()" />
<label for="reconceal"><small>OMG Don't Touch This Field!?!?</small></label></p>

The legitimate visitors will never trigger this, but any bot that interacts with every field on a page will.

Step 5

If you chose to use a text input field in Step 4a, consider making that a challenge-response test. It’s always a good idea to use a server-side check to back up a client-side check. I like to ask the question “What color is an orange?” as a tip of the hat to Eric Meyer. This challenge-response test can be integrated into wp-comments-post.php, so that if the user fails the test, the form submission dies in the same way it would if a required field were left blank.

For this example, let’s ask the question “What color is Kermit the Frog?” The answer, of course, is green.

Open your wp-comments-post.php file, and (in WP version 2.2.2) somewhere around line 31 add:

$comment_verify = strtolower(trim($_POST['verify']));

This will trim any whitespace from the answer to the challenge question, and convert the characters to lowercase.

Around line 45, find the lines:

} else {
	if ( get_option('comment_registration') )
		wp_die( __('Sorry, you must be logged in to post a comment.') );
}

And add the following lines to them, as so:

} else {
	if ( get_option('comment_registration') )
		wp_die( __('Sorry, you must be logged in to post a comment.') );
	if ( $comment_verify != 'green' )
		wp_die( __('Sorry, you must correctly answer the "Kermit" question to post a comment.') );
}

This will check the visitor’s answer against the correct answer, and if they don’t match, the script will stop and the comment won’t be submitted. The visitor can hit the back button to change the answer.

Now we need to add the challenge question to the theme file comments.php. Find the lines:

<p><input type="text" name="url" id="url" value="<?php echo $comment_author_url; ?>" size="22" tabindex="3" />
<label for="url"><small>Website</small></label></p>

And right below them, add the following lines:

<p><input type="text" name="verify" id="verify" value="" size="22" tabindex="4" />
<label for="verify"><small>What color is Kermit the Frog? (anti-spam) (required)</small></label></p>

(You may need to update all the tabindex numbers after you add this field.)

That’s it, your visitors will now have to correctly answer the Kermit the Frog question to submit the form.

Further customization

The methods described here just scratch the surface of what can be done to obfuscate the comment handling script. For example, the URL could be broken up into parts and saved as multiple variables. It could have a name comprised of numbers, and the commentScriptReveal JavaScript could perform math to assemble it.

I can’t imagine I’m the first person to come up with this, but… the user could be required to complete a challenge-response question, the correct answer to which would be used as part of the name of the script—the URL to the script doesn’t exist anywhere until the visitor creates it.

But, there are some people who don’t want to impose even a minor inconvenience on their visitors. If you don’t like challenge-response tests, what about using JavaScript to invisibly check the screen resolution of the user agent? And we haven’t even considered using cookies yet.

Credits

Many thanks to Jeff Barr for demonstrating how to put a challenge-response test into the wp-comments-post.php file in his post WordPress Comment Verification (With Source Code). I had been using only JavaScript for validation up until that point, shame on me.

Many thanks also to Will Bontrager for writing a provocative explanation of how to temporarily hide the form processing script using JavaScript in Spamming You Through Your Own Forms.

Huge thanks to Eric Meyer, who wrote a pre-plugin, challenge-response test script called WP-Gatekeeper for a very early version of WordPress.

I’ve described two similar methods for defeating contact form spam by hiding the webmail script in an eariler post.

Correcting for line descent in Firefox

I was working on a theme for the image gallery Plogger when an old problem cropped up again. I was adding links to thumbnail images, and had given the anchor a padding of 3 pixels and a 1 pixel border so that the link formed a sort of picture frame around the image. It looked fine in IE7, with a consistent 3 pixels of space on all 4 sides of the image between the image and the anchor’s border. But in Firefox, the top and sides had 3 pixels of space between the image and the anchor’s border, but the bottom had 5 pixels of space. For some reason, an additional 2 pixel gap was appearing below the image, between the image and the bottom border of the anchor.

This extra CSS space below images wrapped in anchor tags had plagued me in the past, and I had always just found a work-around, such as applying the padding and/or border to the image. But this time, I decided to figure out what was going on. I Googled for awhile and finally hit on a forum thread that explained the fix. As it turns out, Firefox allows for something called ‘line descent’, which, in typography, is the amount of space below the baseline of a font.

Most scripts share the notion of a baseline: an imaginary horizontal line on which characters rest. In some scripts, parts of glyphs lie below the baseline. The descent spans the distance between the baseline and the lowest descending glyph in a typeface, and the part of a glyph that descends below the baseline has the name “descender”.

http://en.wikipedia.org/wiki/Typeface

So the descender would be the lowest portion of lowercase letters such as ‘g,’ ‘j,’ ‘p.’ ‘q,’ and ‘y’. Why Firefox wants to accommodate that by adjusting the bottom of an anchor tag remains a mystery to me.

The fix is to add img {vertical-align: bottom;} where appropriate to your CSS. In practice, it eliminates the CSS space under the image in Firefox by pulling up the bottom of the anchor, rather than by pushing down the image. There is no apparent change in IE7.

Centering the thumbnails in Plogger

This post illustrates a method of centering the thumbnails in the album view of the PHP image gallery Plogger. The method automatically adjusts for thumbnails of varying widths and pages containing less than a full row of images.

This method should work in any theme that uses the unordered list derived from the default Plogger theme and that has a fixed and determinable width for the element ‘ul.slides’.

Overview of the method

In brief, the template file album.php is edited to add a PHP script that figures out how many thumbnails are in the first row and then adjusts the left margin of each ‘li.thumbnail’ in order to keep the same amount of space on either side of each image. The user is required to manually set 3 variables: $center_thumbs, $total_space and $thumb_padding, and the script does the rest.

Line-by-line documentation

Below is all of the relevant code, to be placed into the theme file ‘album.php’.

<?php plogger_load_picture();
// Set variables for the thumbnails
$capt = plogger_get_picture_caption();
$date = plogger_get_picture_date();
// Find thumbnail width
$thumb_info = plogger_get_thumbnail_info();
$thumb_width = $thumb_info[0]; // The width of the thumbnail image, in pixels.
$thumb_height = $thumb_info[1];	// The height of the thumbnail image, in pixels.
		
// Set album page options
$center_thumbs = 'true'; // When the value of "$center_thumbs" is set to 'true', the theme will center the thumbnail images 
$total_space = 798; // Set the value of $total_space to equal the width, in pixels, of the interior of 'ul.slides' (i.e. after deducting for any padding)
$thumb_padding = 33; // Set the value of $thumb_padding to equal the sum of all padding and borders on 'ul.slides li.thumbnail', the thumbnail image and the anchor tag (this can be tricky to calculate)

$thumbs_on_page = $GLOBALS["available_pictures"]; 
$actual_thumb_width = $thumb_width + $thumb_padding; 
$max_thumbs_per_row = floor($total_space / $actual_thumb_width); 
($thumbs_on_page < $max_thumbs_per_row)? $thumbs_per_row = $thumbs_on_page : $thumbs_per_row = $max_thumbs_per_row ; 
$avail_space = $total_space - ($thumbs_per_row * $actual_thumb_width); 
$left_margin = floor($avail_space / ($thumbs_per_row + 1)); 
?>


<li class="thumbnail"<?php if ($center_thumbs == 'true') echo 'style="margin-left: ' . $left_margin . 'px"'; ?>>

Here’s how the whole thing works, with each code section followed by a description of what’s happening:

$center_thumbs = 'true'; // When the value of "$center_thumbs" is set to 'true', the theme will center the thumbnail images 
$total_space = 798; // Set the value of $total_space to equal the width, in pixels, of the interior of 'ul.slides' (e.g.: after deducting for any padding)
$thumb_padding = 33; // Set the value of $thumb_padding to equal the sum of all padding and borders on 'ul.slides li.thumbnail', the thumbnail image and the anchor tag (this can be tricky to calculate)

The centering feature can be toggled on and off by setting $center_thumbs to ‘true’ or any other value. The user will need to specify integer values for $total_space and $thumb_padding, as described in the commented lines.

$thumbs_on_page = $GLOBALS["available_pictures"];

The number of available pictures on the current page is assigned to the $thumbs_on_page variable.

$actual_thumb_width = $thumb_width + $thumb_padding;

The actual width of each ‘li.thumbnail’ is calculated and saved as the variable $actual_thumb_width. The width of the thumbnail image is available to the theme as $thumb_width, but the user must specify a value for $thumb_padding, which is an integer equal to the sum of all of the padding and borders on the thumbnail image, the surrounding anchor tag, and the ul list item ‘.thumbnail’. This can be rather tricky to calculate, so you may want to increase your figure by a few pixels just to be safe.

$max_thumbs_per_row = floor($total_space / $actual_thumb_width);

The maximum possible number of thumbnails per row is calculated by dividing the useable width of ‘ul.slides’ by the actual width of a ‘li.thumbnail’, then rounding down the quotient to the next lowest integer using the PHP function floor(). For example, if the quotient of $total_space / $actual_thumb_width is 4.9, $max_thumbs_per_row would equal 4, because 5 thumbnails would be wider than the available space.

($thumbs_on_page < $max_thumbs_per_row)? $thumbs_per_row = $thumbs_on_page : $thumbs_per_row = $max_thumbs_per_row ;

The actual number of thumbs in the first row is calculated and assigned to $thumbs_per_row using a ternary operator. If the number of thumbs on the current page is less than the maximum number possible in a single row, it follows that the row isn’t full, and the number of thumbs on the page is assigned to the variable $thumbs_per_row. However, if the number of thumbs on the page is equal to or greater than the maximum number possible in a single row, it follows that the first row contains the maximum number possible, and so the thumbs will be spaced as though every row is full. Doing it this way maintains the grid, but if one wanted partially filled rows to be centered according to the number of remaining images, that would be possible.

$avail_space = $total_space - ($thumbs_per_row * $actual_thumb_width);

The amount of white space, $avail_space, is calculated by subtracting from the total usable space the sum of the widths of the thumbs in the first row.

$left_margin = floor($avail_space / ($thumbs_per_row + 1));

Finally, the number of pixels for the left margin of each ‘li.thumbnail’, $left_margin, is calculated by dividing $avail_space by the sum of 1 plus the number of thumbs in the first row, and then rounding down the quotient to the next lowest integer. The number of thumbs must be increased by 1 to account for the white space to the right of the last thumbnail.

In practice, the white space to the right of last thumbnail may be a few pixels wider or narrower than the left margins, but this deviation should be limited to within 4 or 5 pixels.

Defeating contact form spam by hiding the webmail script

My clients and I have been receiving increasing amounts of spam sent through our own contact forms. Not being a spammer myself, I’m left to speculate on how one sends spam through a webmail form, but I’ve come up with two ways of preventing it from happening. Both of these methods involve editing the contact form’s HTML and adding a JavaScript file. They also require that legitimate users of the contact form have DOM-compliant browsers with JavaScript enabled.

Defeating human-like robots

For a very long time, I suspected that the spammers’ bots were filling out and submitting forms just like regular human visitors. They would look for input fields with labels like ‘name’ and ’email’, and, of course, for textarea elements. The bots would enter values into the fields and hit the submit button and move on to the next form.

To combat this, one could institute a challenge-response test in the form of a question that must be correctly answered before the form is submitted. Eric Meyer wrote a very inspiring piece at WP-Gatekeeper on the use of easily human-comprehensible challenge questions like “What is Eric’s first name?” as a way to defeat spambots. There are a number of accessibility concerns and limitations with this method, mostly with respect to choosing a challenge question that any human being (of any mental or physical capacity, speaking any language, etc.) could answer, but that a robot would be unable to recognize as a challenge question or be unable to correctly answer. However, these issues also exist with the CAPTCHA method.

In this case, the challenge question will be What color is an orange? If answered correctly, the form is submitted. If answered incorrectly, the user is prompted to try again.

Here’s how to implement a challenge question method of form validation:

First, create a JavaScript file named ‘validate.js’ with the following lines:

function validateForm()
{
    valid = true;

    if ( document.getElementById('verify').value != "orange" )
    {
        alert ( "You must answer the 'orange' question to submit this form." );
		document.getElementById('verify').value = "";
		document.getElementById('verify').focus();
		valid = false;
    }

    return valid;
}

This script gets the value of the input field with an ID of ‘verify’ and if the value is not the word ‘orange’, the script returns ‘false’ and doesn’t allow the form to post. Instead, it pops up a helpful alert, erases the contents of the ‘verify’ field, and sets the cursor at the beginning of the field.

Add the JavaScript to your HTML with something like:

<head>
...
<script src="validate.js" type="text/javascript"></script>
...
</head>

Next, modify the form to call the function with an onSubmit event. This event will be triggered when the form’s Submit button is activated. Add an input field with the ID ‘verify’ and an onChange event to convert the value to lowercase. Add the actual challenge question as a label.

<form id="contactform" action="../webmail.php" method="post" onsubmit="return validateForm();">
...
	<div><input type="text" name="verify" id="verify" value="" size="22" tabindex="1" onchange="javascript:this.value=this.value.toLowerCase();" /></div>
	<label for="verify">What color is an orange?</label>
...
</form>

A visitor to the site who fills out the form but does not correctly answer the challenge question will not be able to submit the form.

Defeating non-human-like robots

I believe that the challenge-response method is becoming less effective, however. According the article ‘Spamming You Through Your Own Forms‘ by Will Bontrager, the spammers’ bots are not using the form as it is intended.

This is what appears to be happening: Spammers’ robots are crawling the web looking for forms. When the robot finds a form:

It makes a note of the form field names and types.

It makes a note of the form action= URL, converting it into an absolute URL if needed.

It then sends the information home where a database is updated.

Dedicated software uses the database information to insert the spammer’s spew into your form and automatically submit it to you.

His response is to stop the process at step 2 by eliminating the bots’ access to the webmail script. He suggests doing this by hiding the URL of the webmail script in an external JavaScript file, then using JavaScript to delay the writing of the form’s action attribute for a moment. The robots parsing just the page’s HTML never locate the URL to the webmail script, so it is never available for the spammers to exploit.

While I like the idea, I think I’ve come up with a better way of implementing it.

First, rename the webmail script, because the spammers already know the name and location of that script. For example, if GoDaddy is your host, contact forms on your site may be handled by ‘gdform.php’, located in the server root. You’ll need to rename that to something else. For purposes of illustration, I’ll rename the script ‘safemail.php’, but a string of random hexadecimal characters would be even better.

Next, give your contact form an ID. If you are running WordPress or other blogging software, be sure to give the contact form a different ID than the comment form, or else the JavaScript will cause the comment form to post to the webmail script. I’ll give my contact form the ID ‘contactform’.

<form id="contactform" action="../gdform.php" method="post">

We want to prevent the spammers from learning about the newly renamed script. This is done by giving the URL to a fake webmail script as the form’s action attribute and using JavaScript to change the action attribute of the form to the real webmail script only after some user interaction has occurred. I’ll use ‘no-javascript.php’ as my fake script.

To accommodate visitors who aren’t using JavaScript, the fake script could instead be a page explaining that JavaScript is required to submit the contact form and offering an alternate way to contact the author.

Edit the contact form’s action attribute to point to the fake script.

<form id="contactform" action="no-javascript.php" method="post">

Create a new, external JavaScript file called ‘protect.js’, with the following lines:

function formProtect() {
	document.getElementById("contactform").setAttribute("action","safemail.php");
}

The function formProtect, when called, finds the HTML element with ID ‘contactform’ and changes its ‘action’ attribute to ‘safemail.php’. Obviously, one could make this script more complex and potentially more difficult for spammers to parse through the use of variables, but I don’t see that as necessary at this point.

Add the JavaScript to your HTML with something like:

<head>
...
<script src="formprotect.js" type="text/javascript"></script>
...
</head>

Finally, call the script at some point during the process of filling out the form. Exactly how you want to do this is up to you, and it’ll be effective longer if you don’t share how you do it. Perhaps the most straight-forward way would be to call the script at the point of submission by adding onsubmit="return formProtect();" to the <form> element.

<form id="contactform" action="no-javascript.php" method="post" onsubmit="return formProtect();">

If you want to use both the challenge question and the action rewriting functions, you may want to combine them into a single file or trigger formProtect separately with an event on one of the required input fields. If you decide to trigger formProtect with an event other than onsubmit, consider usability/accessibility issues—not everyone uses a mouse.

In conclusion

By implementing both of these methods, it is possible to dramatically reduce or even completely stop contact form spam. In the two months since I implemented this system, I haven’t received a single spam email from any of my contact forms.

The challenge-response test should deter or at least hinder human spammers and robots that fill out forms as though they were human. The trade-off is some added work for legitimate users of the form.

The action attribute rewriting method should immediately eliminate all spam sent directly to your form by spammers who have the URL of your webmail script in their databases. It should also prevent the rediscovery of the URL. Visitors with JavaScript enabled won’t be aware of the anti-spam measures.

For WordPress users

Defeating WordPress comment spam explains how to apply the attribute rewriting method to your WordPress site.

A plugin for adding the post date to wp_get_archives

The WordPress function wp_get_archives(‘type=postbypost’) displays a lovely list of posts, but won’t show the date of each post. This plugin adds each post’s date to those ‘postbypost’ lists, like so:

Add dates to wp_get_archives

Usage

Upload and activate the plugin
Edit your theme, replacing wp_get_archives('type=postbypost') with if (function_exists('ard_get_archives')) ard_get_archives();

The function ard_get_archives(); replaces wp_get_archives('type=postbypost'), meaning you don’t need to specify type=postbypost. You can use all of the wp_get_archives() parameters except ‘type’ and ‘show_post_count’ (limit, format, before, and after). In addition, there’s a new parameter: show_post_date, that you can use to hide the date, but the plugin will show the date by default.

show_post_date
(boolean) Display date of posts in an archive (1 – true) or do not (0 – false). For use with ard_get_archives(). Defaults to 1 (true).

Customizing the date

By default, the plugin displays the date as “(MM/DD/YYYY)”, but you can change this to use any standard PHP date characters by editing the plugin at the line:

$arc_date = date('m/d/Y', strtotime($arcresult->post_date));  // new

The date is wrapped in tags, so you can style the date independently of the link.

How does it work?

The plugin replaces the ‘postbypost’ part of the function wp_get_archives, and adds the date to $before. The relevant code is below. You can compare it to the corresponding lines in general-template.php.

	} elseif ( ( 'postbypost' == $type ) || ('alpha' == $type) ) {
		('alpha' == $type) ? $orderby = "post_title ASC " : $orderby = "post_date DESC ";
		$arcresults = $wpdb->get_results("SELECT * FROM $wpdb->posts $join $where ORDER BY $orderby $limit");
		if ( $arcresults ) {
			$beforebefore = $before;  // new
			foreach ( $arcresults as $arcresult ) {
				if ( $arcresult->post_date != '0000-00-00 00:00:00' ) {
					$url  = get_permalink($arcresult);
					$arc_title = $arcresult->post_title;
					$arc_date = date('m/d/Y', strtotime($arcresult->post_date));  // new
					if ( $show_post_date )  // new
						$before = $beforebefore . '<span class="recentdate">' . $arc_date . '</span>';  // new
					if ( $arc_title )
						$text = strip_tags(apply_filters('the_title', $arc_title));
					else
						$text = $arcresult->ID;
					echo get_archives_link($url, $text, $format, $before, $after);
				}
			}
		}
	}

The lines ending in ‘// new’ are the only changes.

So you want the date to appear after the title? Edit the plugin to modify $after, instead:

	} elseif ( ( 'postbypost' == $type ) || ('alpha' == $type) ) {
		('alpha' == $type) ? $orderby = "post_title ASC " : $orderby = "post_date DESC ";
		$arcresults = $wpdb->get_results("SELECT * FROM $wpdb->posts $join $where ORDER BY $orderby $limit");
		if ( $arcresults ) {
			$afterafter = $after;  // new
			foreach ( $arcresults as $arcresult ) {
				if ( $arcresult->post_date != '0000-00-00 00:00:00' ) {
					$url  = get_permalink($arcresult);
					$arc_title = $arcresult->post_title;
					$arc_date = date('j F Y', strtotime($arcresult->post_date));  // new
					if ( $show_post_date )  // new
						$after = '&nbsp;(' . $arc_date . ')' . $afterafter;  // new
					if ( $arc_title )
						$text = strip_tags(apply_filters('the_title', $arc_title));
					else
						$text = $arcresult->ID;
					echo get_archives_link($url, $text, $format, $before, $after);
				}
			}
		}
	}

Download

Get the files here: (Current version: 0.1 beta)

Download the Ardamis DateMe WordPress Plugin

Apricot – A Minimalist WordPress Theme

Apricot is a text-heavy and graphic-light, widget- and tag-supporting minimalist WordPress theme built on a Kubrick foundation. Apricot validates as XHTML 1.0 Strict and uses valid CSS. It natively supports the excellent Other Posts From Cat and the_excerpt Reloaded plugins, should you want to install them.

WordPress version 2.3 introduces native support for ‘tags’, a method of organizing posts according to key words. Apricot has been updated to use this native tag system. The tag cloud will appear in the sidebar and the tags for each post appear above the meta data.

I used Apricot on this site for over a year, making little tweaks and adjustments the whole time, so the theme is pretty thoroughly tested in a variety of different browsers and resolutions. While the markup is derived from the WordPress default theme, Kubrick, I’ve added a few modifications of my own. I’ve listed some of these changes below.

header.php

Title tag reconfigured to display “Page Title | Site Name”

single.php

Post title is now wrapped in H1 tags
Metadata shows when the post was last modified (if ever)
Added links to social bookmarking/blog indexing sites: Del.icio.us, Digg, Furl, Google Bookmarks, and Technorati
I’ve published a fix for the Sociable plugin, which I’m now using instead of hard-coded links
If the Other Posts From Cat plugin is active, the theme will use it
Comments by the post’s author can be styled independently

page.php

Displays the page’s last modified date (instead of date of publication)

index.php

Displays the full text of the latest post and an excerpt from each of the next nine most recent posts
Native support for the_excerpt Reloaded plugin, if active

sidebar.php

Displays tag cloud, if tags are enabled

search.php

If no results found, displays the site’s most recent five posts

404.php

Displays the site’s most recent five posts

footer.php

Archive and index page titles + blog name wrapped in H1 tags

Screen shot

Search engine optimization

Apricot takes care of most of the on-page factors that Google values highly. It places the post’s title at the beginning of the title tag and in a H1 tag near the top of the page. It is free of extraneous markup and the navigation is easily spiderable. It generates what I think is a pretty logical site structure from the various post and category pages, though I have yet to study the effect of the new tagging system.

I’ve had a few top-ranked pages with this and other structurally similar layouts. Your mileage with the search engines may vary, but the layout uses fundamentally sound structural markup, which should give your site a good start.

Download

Download the theme from http://wordpress.org/extend/themes/apricot or from the link below.

Download the Apricot WordPress Theme

What if I want to use an image as a header?

Lots of people would rather use a graphic as a header, including me, but the WordPress guys insist on each theme uploaded to http://wordpress.org/extend/themes/ display the blog title and tag line.

If you want to replace the blog title and tag line with an image, download this zip file and follow these instructions (also included in readme.txt).

1. Make a PNG image, name it “header.png” and upload it to the /wp-content/themes/apricot/images/ folder. It should be 800px wide by 130px tall, or less.

2. Replace the original Apricot theme’s header.php file with the header.php file from this folder.

Download the Apricot Image Header