Tag Archives: database

Akismet is not enough to stop spam comments

I like Akismet, and it’s undeniably effective in stopping the vast majority of spam, but it adds a huge number of comments to the database and a very small percentage of comments still get through to my moderation queue.

It’s annoying to find comments in my moderation queue, but what I really object to is the thousands of records that are added to the database each month that I don’t see.

In the screenshot below, January through April show very few spam comments being detected by Akismet. This is because I was using my cache-friendly method for reducing WordPress comment spam to block spam comments even before Akismet analyzed them.

In May, I moved hosting providers to asmallorange.com and started with a fresh install of WordPress without implementing my custom spam method, which admittedly was not ideal because it involved changing core files. This left only Akismet between the spammers and my WordPress database. Since that time, instead of 150 or fewer spam comments per month making it into my WordPress database, Akismet was on pace to let in over 10,000.

So, in the spirit of fresh starts and doing things the right way, I created a WordPress plug-in that uses the same timestamp method. It’s actually exactly the same JavaScript and PHP code, just in plug-in form, so it’s not bound to any core files or theme files.

What is this Joomla exploit doing on my WordPress site?

7:21 PM 2/26/2012

I recently ran the spider at www.xml-sitemaps.com against www.ardamis.com and it returned a list of URLs that included a few pages with some suspicious-looking parameters. This is the second time I’ve come across these URLs, so I decided to document what was going on. The first time, I just cleared the cache, spidered the site to preload the cache, and confirmed that the spider didn’t encounter the pages. And then I forgot all about it. But now I’m mad.

Normally, a URL list for a WordPress site includes the various pages of the site, like so:

//ardamis.com/
//ardamis.com/page/2/
//ardamis.com/page/3/

But in the suspicious URL list, there are additional URLs for the pages directly off of the site’s root.

//ardamis.com/
//ardamis.com/?option=com_google&controller=..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F%2Fproc%2Fself%2Fenviron%0000
//ardamis.com/page/2/
//ardamis.com/page/2/?option=com_google&controller=..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F%2Fproc%2Fself%2Fenviron%0000
//ardamis.com/page/3/
//ardamis.com/page/3/?option=com_google&controller=..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F%2Fproc%2Fself%2Fenviron%0000

This occurs only for the pagination of the main site’s pages. I did not find URLs containing the parameter ?option=com_google&controller= for any pages that exist under a category or tag, but that also use the /page/2/ convention.

The parameter is the urlencoded version of the text:

?option=com_google&controller=..//..//..//..//..//..//..//..///proc/self/environ00

Exploration

I compared the source code of the pages at the clean URLs vs that of the pages at the bad URLs and found that there was a difference in the pagination code generated by the WP-Paginate plugin.

The good pages had normal-looking pagination links.

<div class="navigation">
<ol class="wp-paginate">
<li><span class="title">Navigation:</span></li>
<li><a href="//ardamis.com/page/2/" class="prev">&laquo;</a></li>
<li><a href='//ardamis.com/' title='1' class='page'>1</a></li>
<li><a href='//ardamis.com/page/2/' title='2' class='page'>2</a></li>
<li><span class='page current'>3</span></li>
<li><a href='//ardamis.com/page/4/' title='4' class='page'>4</a></li>
<li><a href='//ardamis.com/page/5/' title='5' class='page'>5</a></li>
<li><a href='//ardamis.com/page/6/' title='6' class='page'>6</a></li>
<li><a href='//ardamis.com/page/7/' title='7' class='page'>7</a></li>
<li><span class='gap'>...</span></li>
<li><a href='//ardamis.com/page/17/' title='17' class='page'>17</a></li>
<li><a href="//ardamis.com/page/4/" class="next">&raquo;</a></li>
</ol>
</div>

The bad pages had the suspicious URLs, but were otherwise identical. Other than the URLs in the navigation, there was nothing alarming about the HTML on the bad pages.

I downloaded the entire site and ran a malware scan against the files, which turned up nothing. I also did some full-text searching of the files for the usual base64 decode eval type stuff, but nothing was found. I searched through the tables in my database, but didn’t see any instances of com_google or proc or environ that I could connect to the suspicious URLs.

Google it

Google has turned up a few good links about this problem, including:

http://www.exploitsdownload.com/search/com_/36 – AntiSecurity/Joomla Component Contact Us Google Map com_google Local File Inclusion Vulnerability
http://forums.oscommerce.com/topic/369813-silly-hacker/ – “On a poorly-secured LAMP stack, that would read out your server’s environment variables. That is one step in a process that would grant the hacker root access to your box. Be thankful it’s not working. Hacker is a bad term for this. This is more on the Script Kiddie level.”
The poster also provided a few lines of code for blocking these URLs in an .htaccess file.
```
# Block another hacker
RewriteCond %{QUERY_STRING} ^(.*)/self/(.*)$ [NC]
RewriteRule ^.* - [F]
```
http://forums.oscommerce.com/topic/369813-silly-hacker/ – “This was trying for Local File Inclusion vulnerabilities via the Joomla/Mambo script.”
http://core.trac.wordpress.org/ticket/14556 – a bug ticket submitted to WordPress over a year earlier identifying a security hole if the function that generates the pagination isn’t wrapped in a url_esc function that sanitizes the URL. WP-Paginate’s author submits a comment to the thread, and the plugin does use url_esc.

So, what would evidence of an old Joomla exploit be doing on my WordPress site? And what is happening within the WP-Paginate plugin to cause these parameters to appear?

Plugins

It seemed prudent to take a closer look at two of the plugins used on the site.

Ardamis uses the WP-Paginate plugin. The business of generating the /page/2/, /page/3/ URLs is a native WordPress function, so it’s strange to see how those URLs become subject to some sort of injection by way of the WP-Paginate plugin. I tried passing a nonsense parameter in a URL (//ardamis.com/page/3/?foobar) and confirmed that the navigation links created by WP-Paginate contained that ?foobar parameter within each link. This happens on category pages, too. This behavior of adding any parameters passed in the URL to the links it is writing into the page, even if they are urlencoded, is certainly unsettling.

The site also uses the WP Super Cache plugin. While this plugin seems to have been acting up lately, in that it’s not reliably preloading the cache, I can’t make a connection between it and the problem. I also downloaded the cache folder and didn’t see cached copies of these URLs. I turned off caching in WP Super Cache but left the plugin activated, cleared the cache, and then sent the spider against the site again. This time, the URL list didn’t contain any of the bad URLs. Otherwise, the lists were identical. I re-enabled the plugin, attempted to preload the cache (it got through about 70 pages and then stopped), and then ran a few spiders against the site to finish up the preloading. I generated another URL list and the bad URLs didn’t appear in it, either.

A simple fix for the WP-Paginate behavior

The unwanted behavior of the WP-Paginate plugin can be corrected by changing a few lines of code to strip off the GET parameters from the URL. The lines to be changed all reference the function get_pagenum_link. I’m wrapping that function in the string tokenizing function strtok to strip the question mark and everything that follows.

The relevant snippets of the plugin are below.

			
$prevlink = ($this->type === 'posts')
? esc_url(strtok(get_pagenum_link($page - 1), '?'))
: get_comments_pagenum_link($page - 1);
$nextlink = ($this->type === 'posts')
? esc_url(strtok(get_pagenum_link($page + 1), '?'))
: get_comments_pagenum_link($page + 1);

			
function paginate_loop($start, $max, $page = 0) {
    $output = "";
    for ($i = $start; $i <= $max; $i++) {
        $p = ($this->type === 'posts') ? esc_url(strtok(get_pagenum_link($i), '?')) : get_comments_pagenum_link($i);
        $output .= ($page == intval($i))
        ? "<li><span class='page current'>$i</span></li>"
        : "<li><a href='$p' title='$i' class='page'>$i</a></li>";
    }
    return $output;
}

Once these changes are made, WP-Paginate will no longer insert any passed GET parameters into the links it’s writing into that page.

Bandaid

The change to the WP-Paginate plugin is what we tend to call a bandaid – it doesn’t fix the problem, it just suppresses the symptom.

I’ve found that once the site picks up the bad URLs, they can be temporarily cleaned by clearing the cache and then using a spider to recreate it. The only thing left to do is determine where they are coming from in the first place.

The facts

Let’s pause to review the facts.

The http://www.xml-sitemaps.com spider sent against //ardamis.com discovers pages with odd parameters that shouldn’t be naturally occurring on the pages
The behavior of the WP-Paginate plugin is to accept any parameters passed and tack them onto the URLs it is generating
Deleting the cached pages created by WP Super Cache and respidering produces a clean list – the bad URLs are absent

So how is the spider finding pages with these bad URLs? How are they first getting added to a page on the site? It would seem likely that they are originating only on the home page, and the absence of the parameters on other pages that use pagination seems to support that theory.

An unsatisfying ending

Well, the day is over. I’ve added my updated WP-Paginate plugin to the site, so hopefully Ardamis has seen the last of the problem, but I’m deeply unsatisfied that I haven’t been able to get to the root cause. I’ve scoured the site and the database, and I can’t find any evidence of the URLs anywhere. If the bad URLs come back again, I’ll not be so quick to clean up the damage, and will instead try to preserve it long enough to make a determination as to their origin.

Update 07 April 2012: It’s happened again. When I spider the site, two pages have the com_google URL. These page have the code appended to the end of the URL created by the WordPress function cancel_comment_reply_link(). This function generates the anchor link in the comments area with an ID of cancel-comment-reply-link. This time, though, I see the hijacked URL used in the link even when I visit the clean URL of the page.

This code is somehow getting onto the site in such a way that it only shows up in the WP Super Cache’d pages. Clearing the cache and revisiting the page returns a clean page. My suspicion is that someone is visiting my pages with the com_google code as part of the URL. WordPress puts the code into a self-referencing link in the comment area. WP Super Cache then updates the cache with this page. I don’t think WordPress can help but work this way with nested comments, but WP Super Cache should know better than to create a cached page from anything but the content from the server.

In the end, because I wasn’t using nested comments to begin with, I chose to remove the block of code that was inserting the link from my theme’s comments.php file.

    <div class="cancel_comment_reply">
        <small><?php cancel_comment_reply_link(); ?></small>
    </div>

I expect that this will be the last time I find this type of exploit on ardamis.com, as I don’t think there is any other mechanism that will echo out on the page the contents of a parameter passed in the URL.

A PHP MySQL snippet for looking up names

For a recent project, I needed to create a form that would perform a look up of people names in a MySQL database, but I wanted to use a single input field. To make it easy on the users of the form, I wanted the input field to accept names in either “Firstname Lastname” or “Lastname, Firstname” format, and I wanted it to autocomplete matches as the users typed, including when they typed both names separated by a space or a comma followed by a space.

The Ajax lookup was quick work with jQuery UI’s Autocomplete widget. The harder part was figuring out the most simple table structure and an appropriate SQL query.

A flawed beginning

My people table contains a “first_name” column and a “last_name” column, nothing uncommon there. To get the project out the door, I wrote a PHP function that ran two ALTER TABLE queries on the people table to create two additional columns for pre-formatted strings (column “firstlast”, to be formatted as “Firstname Lastname”, and column “lastfirst”, to be formatted as “Lastname, Firstname”), added indexes on these columns, and then walked through each record in the table, populating these new fields. I then wrote a very straight forward SQL query to perform a lookup on both fields. The PHP and query looked something like this:

// The jQuery UI Autocomplete widget passes the user input as a value for the parameter "name"
$name= $_GET['name'];

// This SQL query uses argument swapping
$query = sprintf("SELECT * FROM people WHERE (`firstlast` LIKE '%1\$s' OR `lastfirst` LIKE '%1\$s') ORDER BY `lastfirst` ASC",
mysql_real_escape_string($name. "%", $link));

This was effective, accurate, and pretty fast, but the addition of columns bothered me and I didn’t like that I needed to run a process to generate those pre-formatted fields each time a record was added to the table (or if a change was made to an existing record). One possible alternative was to watch the input and match either lastname or firstname until the user entered a comma or a space, then explode the string on the comma or space and search more precisely. Once a comma or a space was encountered, I felt pretty sure that I would be able to accurately determine which part of the input was the first name and which was the last name. But this had that same inefficient, clunky bad-code-smell as the extra columns. (Explode is one of those functions that I try to avoid using.) Writing lots of extra PHP didn’t seem necessary or right.

I’m much more comfortable with PHP than with MySQL queries, but I realize that one can do some amazing things within the SQL query, and that it’s probably faster to use SQL to perform some functions. So, I decided that I’d try to work up a query that solved my problem, rather than write more lines of PHP.

CONCAT_WS to the rescue

I Googled around for a bit and settled on using CONCAT_WS to concatenate the first names and last names into a single string be matched, but found it a bit confusing to work with. I kept trying to use it to create an alias, “lastfirst”, and then use the alias in the WHERE clause, which doesn’t work, or I was getting the literal column names back instead of the values. Eventually, I hit upon the correct usage.

The PHP and query now looks like this:

// The jQuery UI Autocomplete widget passes the user input as a value for the parameter "name"
$name= $_GET['name'];

// This SQL query uses argument swapping
$query = sprintf("SELECT *, CONCAT_WS(  ', ',  `last_name`,  `first_name` ) as lastfirst FROM people WHERE (CONCAT_WS(  ', ',  `last_name`,  `first_name` ) LIKE '%1\$s' OR CONCAT_WS(  ' ',  `first_name`,  `last_name` ) LIKE '%1\$s') ORDER BY lastfirst ASC",
mysql_real_escape_string($name. "%", $link));

The first instance of CONCAT_WS isn’t needed for the lookup. The first instance allows me to order the results alphabetically and provides me an array key of “lastfirst” with a value of the person’s name already formatted as “Lastname, Firstname”, so I don’t have to do it later with PHP. The lookup comes from the two instances of CONCAT_WS in the WHERE clause. I haven’t done any performance measuring here, but the results of the lookup get back to the user plenty fast enough, if not just as quickly as the method using dedicated columns.

The result of the query is output back to the page as JSON-formatted data for use in the jQuery Autocomplete.

The end result works exactly as I had hoped. A user of the form is able to type a person’s name in whatever way is comfortable to them, as “Bob Smith” or “Smith, Bob”, and the matches are found either way. The only thing it doesn’t do is output the matches back to the autocompleter in the same format that the user is using. But I can live with that for now.

A cache-friendly method for reducing WordPress comment spam

Update 2015-01-02: About a month ago, in early December, 2014, Google announced that it was working on a new anti-spam API that is intended to replace the traditional CAPTCHA challenge as a method for humans to prove that they are not robots. This is very good news.
This week, I noticed that Akismet is adding a hidden input field to the comment form that contains a timestamp (although the plugin’s PHP puts the initial INPUT element within a P element set to DISPLAY:NONE, when the plugin’s JavaScript updates the value with the current timestamp, the INPUT element jumps outside of that P element). The injected code looks something like this:
<input type=”hidden” id=”ak_js” name=”ak_js” value=”1420256728989″>
I haven’t yet dug into the Akismet code to discover what it’s doing with the timestamp, but I’d be pleased if Akismet is attempting to differentiate humans from bots based on behavior.

Update 2015-01-10: To test the effectiveness of the current version of Akismet, I disabled the anti-spam plugin described in this post on 1/2/2015 and re-enabled it on 1/10/2015. In the span of 8 days, Akismet identified 1,153 spam comments and missed 15 more. These latest numbers continue to support my position that Akismet is not enough to stop spam comments.

In the endless battle against WordPress comment spam, I’ve developed and then refined a few different methods for preventing spam from getting to the database to begin with. My philosophy has always been that a human visitor and a spam bot behave differently (after all, the bots we’re dealing with are not Nexus-6 model androids here), and an effective spam-prevention method should be able to recognize the differences. I also have a dislike for CAPTCHA methods that require a human visitor to prove, via an intentionally difficult test, that they aren’t a bot. The ideal method, I feel, would be invisible to a human visitor, but still accurately identify comments submitted by bots.

Spam on ardamis.com in early 2012 - before and after

Spam on ardamis.com - before and after

A brief history of spam fighting

The most successful and simple method I found was a server-side system for reducing comment spam by using a handshake method involving timestamps on hidden form fields that I implemented in 2007. The general idea was that a bot would submit a comment more quickly than a human visitor, so if the comment was submitted too soon after the post page was loaded, the comment was rejected. A human caught in this trap would be able to click the Back button on the browser, wait a few seconds, and resubmit. This proved to be very effective on ardamis.com, cutting the number of spam comments intercepted by Akismet per day to nearly zero. For a long time, the only problem was that it required modifying a core WordPress file: wp-comments-post.php. Each time WordPress was updated, the core file was replaced. If I didn’t then go back and make my modifications again, I would lose the spam protection until I made the changes. As it became easier to update WordPress (via a single click in the admin panel) and I updated it more frequently, editing the core file became more of a nuisance.

A huge facepalm

When Google began weighting page load times as part of its ranking algorithm, I implemented the WP Super Cache caching plugin on ardamis.com and configured it to use .htaccess and mod_rewrite to serve cache files. Page load times certainly decreased, but the amount of spam detected by Akismet increased. After a while, I realized that this was because the spam bots were submitting comments from static, cached pages, and the timestamps on those pages, which had been generated server-side with PHP, were already minutes old when the page was requested. The form processing script, which normally rejects comments that are submitted too quickly to be written by a human visitor, happily accepted the timestamps. Even worse, a second function of my anti-spam method also rejected comments that were submitted 10 minutes or more after the page was loaded. Of course, most of the visitors were being served cached pages that were already more than 10 minutes old, so even legitimate comments were being rejected. Using PHP to generate my timestamps obviously was not going to work if I wanted to keep serving cached pages.

JavaScript to the rescue

Generating real-time timestamps on cached pages requires JavaScript. But instead of a reliable server clock setting the timestamp, the time is coming from the visitor’s system, which can’t be trusted to be accurate. Merely changing the comment form to use JavaScript to generate the first timestamp wouldn’t work, because verifying a timestamp generated on the client-side against one generated server-side would be disastrous.

Replacing the PHP-generated timestamps with JavaScript-generated timestamps would require substantial changes to the system.

Traditional client-side form validation using JavaScript happens when the form is submitted. If the validation fails, the form is not submitted, and the visitor typically gets an alert with suggestions on how to make the form acceptable. If the validation passes, the form submission continues without bothering the visitor. To get our two timestamps, we can generate a first timestamp when the page loads and compare it to a second timestamp generated when the form is submitted. If the visitor submits the form too quickly, we can display an alert showing the number of seconds remaining until the form can be successfully submitted. This client-side validation should hopefully be invisible to most visitors who choose to leave comments, but at the very least, far less irritating than a CAPTCHA system.

It took me two tries to get it right, but I’m going to discuss the less successful method first to point out its flaws.

Method One (not good enough)

Here’s how the original system flowed.

Generate a first JS timestamp when the page is loaded.
Generate a second JS timestamp when the form is submitted.
Before the form contents are sent to the server, compare the two timestamps, and if enough time has passed, write a pre-determined passcode to a hidden INPUT element, then submit the form.
After the form contents are sent to the server, use server-side logic to verify that the passcode is present and valid.

The problem was that it seemed that certain bots could parse JavaScript enough to drop the pre-determined passcode into the hidden form field before submitting the form, circumventing the timestamps completely and defeating the system.

Because the timestamps were only compared on the client-side, it also failed to adhere to one of the basic tenants of form validation – that the input must be checked on both the client-side and the server-side.

Method Two (better)

Rather than having the server-side validation be merely a check to confirm that the passcode is present, method two compares the timestamps a second time on the server side. Instead of a single hidden input, we now have two – one for each timestamp. This is intended to prevent a bot from figuring out the ultimate validation mechanism by simply parsing the JavaScript. Finally, the hidden fields are not in the HTML of the page when it’s sent to the browser, but are added to the form via jQuery, which makes it easier to implement and may act as another layer of obfuscation.

Generate a first JS timestamp when the page is loaded and write it to a hidden form field.
Generate a second JS timestamp when the form is submitted and write it to a hidden form field.
Before the form contents are sent to the server, compare the two timestamps, and if enough time has passed, submit the form (client-side validation).
On the form processing page, use server-side logic to compare the timestamps a second time (server-side validation).

This timestamp handshake works more like it did in the proven-effective server-side-only method. We still have to pass something from the comment form to the processing script, but it’s not too obvious from the HTML what is being done with it. Furthermore, even if a bot suspects that the timestamps are being compared, there is no telling from the HTML what the threshold is for distinguishing a valid comment from one that is invalid. (The JavaScript could be parsed by a bot, but the server-side check cannot be, making it possible to require a slightly longer amount of time to elapse in order to pass the server-side check.)

The same downside plagued me

For a long time, far longer than I care to admit, I stubbornly continued to modify the core file wp-comments-post.php to provide the server-side processing. But creating the timestamps and parsing them with a plug-in turned out to be a simple matter of two functions, and in June of 2013 I finally got around to doing it the right way.

The code

The plugin, in all its simplicity, is only 100 lines. Just copy this code into a text editor, save it as a .php file (the name isn’t important) and upload it to the /wp-content/plugins directory and activate it. Feel free to edit it however you like to suit your needs.

<?php

/*
Plugin Name: Timestamp Comment Filter
Plugin URI: //ardamis.com/2011/08/27/a-cache-proof-method-for-reducing-comment-spam/
Description: This plugin measures the amount of time between when the post page loads and the comment is submitted, then rejects any comment that was submitted faster than a human probably would or could.
Version: 0.1
Author: Oliver Baty
Author URI: //ardamis.com

    Copyright 2013  Oliver Baty  (email : obbaty@gmail.com)

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
*/

// http://wordpress.stackexchange.com/questions/6723/how-to-add-a-policy-text-just-before-the-comments
function ard_add_javascript(){

	?>
	
<script type="text/javascript" src="//ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){
    ardGenTS1();
});
 
function ardGenTS1() {
    // prepare the form
    $('#commentform').append('<input type="hidden" name="ardTS1" id="ardTS1" value="1" />');
    $('#commentform').append('<input type="hidden" name="ardTS2" id="ardTS2" value="1" />');
    $('#commentform').attr('onsubmit', 'return validate()');
    // set a first timestamp when the page loads
    var ardTS1 = (new Date).getTime();
    document.getElementById("ardTS1").value = ardTS1;
}
 
function validate() {
    // read the first timestamp
    var ardTS1 = document.getElementById("ardTS1").value;
//  alert ('ardTS1: ' + ardTS1);
    // generate the second timestamp
    var ardTS2 = (new Date).getTime();
    document.getElementById("ardTS2").value = ardTS2;
//  alert ('ardTS2: ' + document.getElementById("ardTS2").value);
    // find the difference
    var diff = ardTS2 - ardTS1;
    var elapsed = Math.round(diff / 1000);
    var remaining = 10 - elapsed;
//  alert ('diff: ' + diff + '\n\n elapsed:' + elapsed);
    // check whether enough time has elapsed
    if (diff > 10000) {
        // submit the form
        return true;
    }else{
        // display an alert if the form is submitted within 10 seconds
        alert("This site is protected by an anti-spam feature that requires 10 seconds to have elapsed between the page load and the form submission. \n\n Please close this alert window.  The form may be resubmitted successfully in " + remaining + " seconds.");
        // prevent the form from being submitted
        return false;
    }
}
</script>
	
	<?php
}

add_action('comment_form_before','ard_add_javascript');

// http://wordpress.stackexchange.com/questions/89236/disable-wordpress-comments-api
function ard_parse_timestamps(){

	// Set up the elapsed time, in miliseconds, that is the threshold for determining whether a comment was submitted by a human
	$intThreshold = 10000;
	
	// Set up a message to be displayed if the comment is blocked
	$strMessage = '<strong>ERROR</strong>:  this site uses JavaScript validation to reduce comment spam by rejecting comments that appear to be submitted by an automated method.  Either your browser has JavaScript disabled or the comment appeared to be submitted by a bot.';
	
	$ardTS1 = ( isset($_POST['ardTS1']) ) ? trim($_POST['ardTS1']) : 1;
	$ardTS2 = ( isset($_POST['ardTS2']) ) ? trim($_POST['ardTS2']) : 2;
	$ardTS = $ardTS2 - $ardTS1;
	 
	if ( $ardTS < $intThreshold ) {
	// If the difference of the timestamps is not more than 10 seconds, exit
		wp_die( __($strMessage) );
	}
}
add_action('pre_comment_on_post', 'ard_parse_timestamps');

?>

That’s it. Not so bad, right?

Final thoughts

The screen-shot at the beginning of the post shows the number of spam comments submitted to ardamis.com and detected by Akismet each day from the end of January, 2012, to the beginning of March, 2012. The dramatic drop-off around Jan 20 was when I implemented the method described in this post. The flare-up around Feb 20 was when I updated WordPress and forgot to replace the modified core file for about a week, illustrating one of the hazards of changing core files.

If you would rather not add any hidden form fields to the comment form, you could consider appending the two timestamps to the end of the comment_post_ID field. Because its contents are cast as an integer in wp-comments-post.php when value of the $comment_post_ID variable is set, WordPress won’t be bothered by the extra data at the end of the field, so long as the post ID comes first and is followed by a space. You could then just explode the contents of the comment_post_ID field on the space character, then compare the last two elements of the array.

If you don’t object to meddling with a core file in order to obtain a little extra protection, you can rename the wp-comments-post.php file and change the path in the comment form’s action attribute. I’ve posted logs showing that some bots just try to post spam directly to the wp-comments-post.php file, so renaming that file is an easy way to cut down on spam. Just remember to come back and delete the wp-comments-post.php file each time you update WordPress.

Jealous of The Social Network

So I finally watched The Social Network over the weekend, and it’s made me feel jealous and a bit guilty.

In a meager effort to console myself for so far failing to be a billionaire, I’m assembling the short list of web-application type things I’ve built here.

A dice roller: rollforit. Enter a name, create a room, invite your friends, and start rolling dice. For people who want to play pen and paper, table-top RPG dice games with their distant friends.
A URL shortener: Minifi.de. Minifi.de comes with an API and a bookmarklet. It really works, too! The technical explanation has more details.
A social networking site: Snapbase. Snapbase is a social site that shows you what’s going on in your city or anywhere in the world as pictures are uploaded by your friends and neighbors. The application extracts location information from the EXIF data embedded in images and displays recent images taken near your present location.
A trouble-ticketing system for an IT help desk or technical support center. It’s really pretty extensive, with asset management, user accounts, salted encrypted passwords, and all sorts of nifty things. I really must write a full description of it at some point, but until then, the documentation is the next best thing.
An account-based invoice tracking and access system for grouping invoices according to clients, then sharing invoice history with those clients and allowing them to easily pay outstanding invoices via Paypal.
An account-based invoice access system where clients can view paid and unpaid invoices, and even easily pay an outstanding invoice via Paypal. I actually use this almost every day.
A simple method for protecting a download using a unique URL that can be emailed to authorized users. The URL can be set to expire after a certain amount of time or any number of downloads.
An update to the above download protection script to protect multiple downloads, generate batches of keys, leave notes about who received the key, the ability to specify per-key the allowable number of downloads and age, and some basic reporting.
An HTML auction template generator called Simple Auction Wizard. It helps you create HTML auction templates for eBay, and uses SWFUpload and tinyMCE.

I have another project in the works that promises to be more financially viable, but the most clever thing on that list is Snapbase. It’s in something akin to alpha right now; barely usable. I really wish I had the time to pursue it.

Online tools for testing web page performance

I’ve written a few tutorials lately on how to reduce page load times. While I use Google’s Page Speed Firefox/Firebug plugin for evaluating pages for load times, there are times when I want a second opinion, or want to point a client to a tool. This post is a collection of links to online tools for testing web page performance.

Page Speed Online

http://pagespeed.googlelabs.com/

Google’s wonderful Page Speed tool, once only available as a Firefox browser Add-on, finally arrives as an online tool. Achieving a high score (ardamis.com is a 96/100) should be on every web developer’s list of things to do before the culmination of a project.

Enter a URL and Page Speed Online will run performance tests based on a set of best practices known to reduce page load times.

Optimizing caching – keeping your application’s data and logic off the network altogether
Minimizing round-trip times – reducing the number of serial request-response cycles
Minimizing request overhead – reducing upload size
Minimizing payload size – reducing the size of responses, downloads, and cached pages
Optimizing browser rendering – improving the browser’s layout of a page

WebPagetest

http://www.webpagetest.org/

WebPagetest is an excellent application for users who want the same sort of detailed reporting that one gets with Page Speed.

Load time speed test on first view (cold cache) and repeat view (hot cache), first byte and start render
Optimization checklist
Enable keep-alive, HTML compression, image compression, cache static content, combine JavaScript and CSS, and use of CDN
Waterfall
Response headers for each request

Load Impact

http://loadimpact.com/pageanalyzer.php

Load Impact is an online load testing service that lets you load- and stress test your website over the Internet. The page analyzer analyzes your web page performance by emulating how a web browser would load your page and all resources referenced in it. The page and its referenced resources are loaded and important performance metrics are measured and displayed in a load-bar diagram along with other per-resource attributes such as URL, size, compression ratio and HTTP status code.

ByteCheck

http://www.bytecheck.com/

ByteCheck is a super minimal site that return your page’s all-important time to first byte (TTFB). Time to first byte is the time it takes for a browser to start receiving information after it has started to make the request to the server, and is responsible for a visitor’s first impression that a page is fast- or slow-loading.

Web Page Analyzer

http://websiteoptimization.com/services/analyze/

My opinion is that the Web Page Analyzer report is good for beginners without much technical knowledge of things like gzip compression and Expires headers. It’s a bit dated, and is primarily concerned with basics like how many images a page contains. It tells you how fast you can expect your page to load for dial-up visitors, which strikes me as quaint and not particularly useful.

Total HTTP requests
Total size
Total size per object type (CSS, JavaScript, images, etc.)
Analysis of number of files and file size as compared to recommended limits

The Performance Grader

http://www.joomlaperformance.com/component/option,com_performance/Itemid,52/

This is another simplistic analysis of a site, like Web Page Analyzer, that returns its analysis in the form of pass/fail grades on about 14 different tests. I expect that it would be useful for developers who want to show a client a third-party’s analysis of their work, if the third-party is not terribly technically savvy.

One unique thing about this tool, though, is that it totals up the size of all images referenced in CSS files (even those that the current page isn’t using).

HTML Size
Total Size
Total Requests
Generation Time
Number of Hosts
Number of Images
Size of Images
Number of CSS Files
Size of CSS Files
Number of Script Files
Size of Script Files
HTML Encoding
Valid HTML
Frames

A chart illustrating the reduction in comment spam at ardamis.com

In August, 2010, I described a simple method for dramatically reducing the number of spam comments that are submitted to a WordPress blog. The spam comments are rejected before they are checked by Akismet, so they never make it into the database at all.

Now, a few months later, I’m posting a screenshot of the Akismet stats graph from the WordPress dashboard showing the number of spam comments identified by Akismet before and after the system was implemented.

Akismet stats for August - December, 2010

The spike in spam comments detected around November 3rd occurred after an update to WordPress overwrote my altered wp-comments.php file. I replaced the file and the spam dropped back down to single digits per day.

Protecting multiple downloads using unique URLs

A little over a year ago, I wrote a post about a PHP script I had created for protecting a download using a unique URL. The post turned out to be pretty popular, and many of the comments included requests to extend the script in useful ways. So, I’ve finally gotten around to updating the script to generate multiple URLs (up to 20) at a time, to allow different files to be associated with different keys, and to allow brief notes to be attached to the download key.

I’ve also added a simple page that prints out a list of all of the keys generated the date and time that each key was created, the filename of the download on the server that the key accesses, the number of times the key was used, and any attached note. This should make it easier to generate gobs of keys, drop them into an Excel spreadsheet, and help the files’ owner keep track of who’s getting which file, and how often.

The scripts themselves are a little more involved this time around, but the general idea is the same. A unique key is generated and combined with a URL that allows access to a single file on the server. Share the URL/key instead of the URL to the file itself to allow a visitor to download the file, but not to know the location of the file. The key will be valid for a certain length of time and number of downloads, and will stop working once the first limiting condition is met. This should prevent unauthorized downloading due to people sharing the keys.

How it works

There are six main components to this system:

the MySQL database that holds each key, the key creation date and time, the maximum age of the key, the number of times the key has been used, the maximum times the key may be used, the file associated with the key, and the note attached to the key, if any
a generatekey.php page that generates the keys and outputs the corresponding unique URLs
a download.php page that accepts the key, checks its validity, and either initiates the download or rejects the key as invalid
a report.php page that returns all of the data in the database
a config.php file that contains variables such as number of downloads allowed, the maximum allowable age of the key, and the filenames of the downloads, along with the database connection information
the .zip file(s) to be protected

The files

The files, along with two example downloads, are available for download as a .zip file.

Download the protecting multiple downloads PHP script

The MySQL database

Using whatever method you’re comfortable with, create a new MySQL database named “download” and add the following table:

CREATE TABLE `downloadkeys` (
  `uniqueid` varchar(12) NOT NULL default '',
  `timestamp` INT UNSIGNED,
  `lifetime` INT UNSIGNED,
  `maxdownloads` SMALLINT UNSIGNED, 
  `downloads` SMALLINT UNSIGNED default '0',
  `filename` varchar(60) NOT NULL default '',
  `note` varchar(255) NOT NULL default '',
  PRIMARY KEY (uniqueid)
);

How to use the scripts

The scripts require a little setup before they’re ready to be used, so open config.php in your text editor of choice.

Change the values for $db_host, $db_username, $db_password, $db_name to point to your database.

Set the variable $maxdownloads equal to the maximum number of downloads (actually, the number of page loads).

Set the variable $lifetime equal to the keys’ viable duration in seconds (86400 seconds = 24 hours).

Set the variable $realfilenames to the real names of actual download files on the server as a comma-separated list (this is optional; you can also use a single filename or just leave it as empty double-quotes: “”). If you have more than one file to protect, enter the names as a comma-separated list and the script will create a drop-down menu as the Filename field. If you leave the variable blank, the form will display an empty input box as the Filename field.

Set the variable $fakefilename to anything – this is what the visitor’s file will be named when the download is initiated.

I would strongly recommend renaming generatekey.php, as anyone who can view it will be able to create unlimited numbers of keys, and worse, they’ll be able to see the filenames (if you set them in config.php). I would also recommend that the directory you put these files into, and each directory on your site (/images, /css, /js, etc.), contain an index.html file. This is a simple security measure that will prevent visitors from snooping around a directory and viewing its contents (though access to the directory contents is usually prohibited by a setting on the server).

Place all the PHP scripts and your .zip file(s) into the same directory on your server.

That’s all there is to it. Whenever you want to give someone access to the download, visit the generatekey.php page and fill out the form. It will generate a key code, save it to a database, and print out a unique link that you can copy and paste into an email or whatever. The page that the unique link points to checks to see if the key code is legitimate, then checks to see if the code is less than X hours old, then checks to see if it has been used less than X times. The visitor will get a descriptive message for the first unmet condition and the script will terminate. If all three conditions are met, the download starts automatically.

Errors and issues

Note: The download will not initiate automatically, and will actually be output as text on the page, if the download.php page is changed to send headers or any output to the browser. Be careful when making modifications or incorporating this script into another page.

Check the HTTP headers (Google for an online service that does this, or install the LiveHTTPHeaders Firefox plugin) of the download link. If the script is working correctly, you should see Content-Transfer-Encoding: binary and Content-Type: application/octet-stream in the headers. If you’re getting a page of text instead of the zip file, you’ll probably see Content-Type: text/html.

Example HTTP headers for a correctly working download

If the script is working correctly, the HTTP headers will look something like this:

HTTP/1.1 200 OK
Date: Sun, 20 Jun 2010 13:31:50 GMT
Server: Apache
Cache-Control: must-revalidate, post-check=0, pre-check=0, private
Content-Disposition: attachment; filename="bogus_download_name.zip"
Content-Transfer-Encoding: binary
Pragma: public
Content-Length: 132
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: application/octet-stream

Minifi.de – Another URL shortening service

I’ve finally opened a Twitter account, so you can follow me at http://twitter.com/ardamis. As a social experiment, it’s interesting to watch and discuss, but I haven’t really participated much. On the other hand, I’m very interested in the phenomenon of URL shortening. So, without too much trouble, I put together yet another URL shortening service: Minifi.de (‘minified’, in the jargon). It does pretty much the same thing as tinyurl.com, bit.ly and is.gd – you enter a long URL and it returns a shorter one. I’m about 50% done with the API (it works, so long as the URL valid), so if anyone knows of any clients that allow the user to specify a shortening service, I’d like to test that functionality. I have plans to make it account-based, so that you can track usage statistics and such, but that will only happen after I’m confident that the thing will survive in the wild.

So, feel free to give it a whirl, but know that it’s still in development and that I’ll probably have to wipe the database a few more times before I get it just right.

Protecting a download using a unique URL

Update 6/25/09: I’ve updated the script to include a number of suggestions made in the comments. The new script supports multiple files, up to 20 URLs can be created at a time, and a brief note can be attached to each key. If these features sound useful, please check out the new post at:

Protecting multiple downloads using unique URLs.

A client asked me to develop a simple method for protecting a download (or digital product) by generating a unique URL that can be distributed to authorized users via email. The URL would contain a key that would be valid for a certain amount of time and number of downloads. The key will become invalid once the first of those conditions is exceeded. The idea is that distributing the unique URL will limit unauthorized downloads resulting from the sharing of legitimate download links.

In addition, once the key has been validated, the download starts immediately, preventing the visitor from seeing the actual location of the download file. What’s more, the file name of the download in the “Save as” dialogue box isn’t necessarily the same as the file name of the file on the server, making the file itself pretty much undiscoverable.

How it works

There are five main components to this system:

the MySQL database that holds each key, the key creation time, and the number of times the key has been used
the downloadkey.php page that generates the unique keys and corresponding URLs
the download.php page that accepts the key, verifies its validity, and either initiates the download or rejects the key as invalid
a dbconnect.php file that contains the link to the database and which is included into both of the other PHP files
the download .zip file that is to be protected

Place all three PHP scripts and the .zip file into the same directory on your server.

The MySQL database

Using whatever method you’re comfortable with, create a new MySQL database named “download” and add the following table:

CREATE TABLE `downloadkey` (
  `uniqueid` varchar(255) NOT NULL default '',
  `timestamp` varchar(255) NOT NULL default '',
  `downloads` varchar(255) NOT NULL default '0',
  PRIMARY KEY (uniqueid)
);

The downloadkey.php page

This page generates the key, creates a URL containing the key, and writes the key to the database. Never give out the location of this page – this is for only you to access.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Download Key Generator</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta name="author" content="//ardamis.com/" />
<style type="text/css">
#wrapper {
	font: 15px Verdana, Arial, Helvetica, sans-serif;
	margin: 40px 100px 0 100px;
}
.box {
	border: 1px solid #e5e5e5;
	padding: 6px;
	background: #f5f5f5;
}
</style>
</head>

<body>
<div id="wrapper">

<h2>Download Key Generator</h2>

<?php
// A script to generate unique download keys for the purpose of protecting downloadable goods

require ('dbconnect.php');

	if(empty($_SERVER['REQUEST_URI'])) {
    	$_SERVER['REQUEST_URI'] = $_SERVER['SCRIPT_NAME'];
	}

	// Strip off query string so dirname() doesn't get confused
	$url = preg_replace('/\?.*$/', '', $_SERVER['REQUEST_URI']);
	$folderpath = 'http://'.$_SERVER['HTTP_HOST'].'/'.ltrim(dirname($url), '/').'/';


// Generate the unique download key
	$key = uniqid(md5(rand()));
//	echo "key: " . $key . "<br />";
	
// Get the activation time
	$time = date('U');
//	echo "time: " . $time . "<br />";
	
// Generate the link
	echo "<p>Here's a new download link:</p>";
	echo "<p><span class=\"box\">" . $folderpath . "download.php?id=" . $key . "</span></p>";

	
// Write the key and activation time to the database as a new row
	$registerid = mysql_query("INSERT INTO downloadkey (uniqueid,timestamp) VALUES(\"$key\",\"$time\")") or die(mysql_error());
?>

<p>&nbsp;</p>
<p>Each time you refresh this page, a unique download key is generated and saved to a database.  Copy and paste the download link into an email to allow the recipient access to the download.</p>
<p>This key will be valid for a certain amount of time and number of downloads, which can be set in the download.php script.  The key will expire and no longer be usable when the first of these conditions is exceeded.</p>
<p>The download page has been written to force the browser to begin the download immediately.  This will  prevent the recipient of the email from discovering the location of the actual download file.</p>

</div>
</body>
</html>

The download.php page

The URL generated by downloadkey.php points to this page. It contains the key validation script and then forces the browser to begin the download if it finds the key is valid.

<?php
// Set the maximum number of downloads (actually, the number of page loads)
$maxdownloads = "2";
// Set the key's viable duration in seconds (86400 seconds = 24 hours)
$maxtime = "86400";

require ('dbconnect.php');

	if(get_magic_quotes_gpc()) {
        $id = stripslashes($_GET['id']);
	}else{
		$id = $_GET['id'];
	}

	// Get the key, timestamp, and number of downloads from the database
	$query = sprintf("SELECT * FROM downloadkey WHERE uniqueid= '%s'",
	mysql_real_escape_string($id, $link));
	$result = mysql_query($query) or die(mysql_error());
	$row = mysql_fetch_array($result);
	if (!$row) { 
		echo "The download key you are using is invalid.";
	}else{
		$timecheck = date('U') - $row['timestamp'];
		
		if ($timecheck >= $maxtime) {
			echo "This key has expired (exceeded time allotted).<br />";
		}else{
			$downloads = $row['downloads'];
			$downloads += 1;
			if ($downloads > $maxdownloads) {
				echo "This key has expired (exceeded allowed downloads).<br />";
			}else{
				$sql = sprintf("UPDATE downloadkey SET downloads = '".$downloads."' WHERE uniqueid= '%s'",
	mysql_real_escape_string($id, $link));
				$incrementdownloads = mysql_query($sql) or die(mysql_error());
				
// Debug		echo "Key validated.";

// Force the browser to start the download automatically

/*
	Variables: 
		$file = real name of actual download file on the server
		$filename = new name of local download file - this is what the visitor's file will actually be called when he/she saves it
*/

   ob_start();
   $mm_type="application/octet-stream";
   $file = "actual_download.zip";
   $filename = "bogus_download_name.zip";
 
   header("Cache-Control: public, must-revalidate");
   header("Pragma: no-cache");
   header("Content-Type: " . $mm_type);
   header("Content-Length: " .(string)(filesize($file)) );
   header('Content-Disposition: attachment; filename="'.$filename.'"');
   header("Content-Transfer-Encoding: binary\n");
 
   ob_end_clean();
   readfile($file);

			}
		}
	}
?>

The dbconnect.php script (database connection)

This is the PHP include referenced by both scripts that contains the database link.

<?php
// Connect to database "download" using: dbname , username , password 
    $link = mysql_connect('localhost', 'root', '') or die("Could not connect: " . mysql_error());
    mysql_select_db("download") or die(mysql_error());
?>

This file will almost certainly require some editing. You will need to specify a host name for your MySQL server and a MySQL username and password in that file at mysql_connect('localhost', 'root', '') so that you can connect to the database you’ve set up. It’s extremely unlikely that your production MySQL database will be installed on localhost with a user “root” and no password.

That’s all there is to it. Whenever you want to give someone access to the download, visit the downloadkey.php page. It will generate a unique key code, save it to a database, and print out a URL that you can copy and paste into an email or whatever. The page at that URL checks to see if the key code is legit, then checks to see if the code is less than X hours old, then checks to see if it has been used less than X times. The visitor will get a descriptive message for the first unmet condition and the script will terminate. If all three conditions are met, the download starts automatically.