Tag Archives: php

I like Akismet, and it’s undeniably effective in stopping the vast majority of spam, but it adds a huge number of comments to the database and a very small percentage of comments still get through to my moderation queue.

It’s annoying to find comments in my moderation queue, but what I really object to is the thousands of records that are added to the database each month that I don’t see.

In the screenshot below, January through April show very few spam comments being detected by Akismet. This is because I was using my cache-friendly method for reducing WordPress comment spam to block spam comments even before Akismet analyzed them.

Akismet stats

In May, I moved hosting providers to asmallorange.com and started with a fresh install of WordPress without implementing my custom spam method, which admittedly was not ideal because it involved changing core files. This left only Akismet between the spammers and my WordPress database. Since that time, instead of 150 or fewer spam comments per month making it into my WordPress database, Akismet was on pace to let in over 10,000.

So, in the spirit of fresh starts and doing things the right way, I created a WordPress plug-in that uses the same timestamp method. It’s actually exactly the same JavaScript and PHP code, just in plug-in form, so it’s not bound to any core files or theme files.

7:21 PM 2/26/2012

I recently ran the spider at www.xml-sitemaps.com against www.ardamis.com and it returned a list of URLs that included a few pages with some suspicious-looking parameters. This is the second time I’ve come across these URLs, so I decided to document what was going on. The first time, I just cleared the cache, spidered the site to preload the cache, and confirmed that the spider didn’t encounter the pages. And then I forgot all about it. But now I’m mad.

Normally, a URL list for a WordPress site includes the various pages of the site, like so:

//ardamis.com/
//ardamis.com/page/2/
//ardamis.com/page/3/

But in the suspicious URL list, there are additional URLs for the pages directly off of the site’s root.

//ardamis.com/
//ardamis.com/?option=com_google&controller=..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F%2Fproc%2Fself%2Fenviron%0000
//ardamis.com/page/2/
//ardamis.com/page/2/?option=com_google&controller=..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F%2Fproc%2Fself%2Fenviron%0000
//ardamis.com/page/3/
//ardamis.com/page/3/?option=com_google&controller=..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F%2Fproc%2Fself%2Fenviron%0000

This occurs only for the pagination of the main site’s pages. I did not find URLs containing the parameter ?option=com_google&controller= for any pages that exist under a category or tag, but that also use the /page/2/ convention.

The parameter is the urlencoded version of the text:

?option=com_google&controller=..//..//..//..//..//..//..//..///proc/self/environ00

Exploration

I compared the source code of the pages at the clean URLs vs that of the pages at the bad URLs and found that there was a difference in the pagination code generated by the WP-Paginate plugin.

The good pages had normal-looking pagination links.

<div class="navigation">
<ol class="wp-paginate">
<li><span class="title">Navigation:</span></li>
<li><a href="//ardamis.com/page/2/" class="prev">&laquo;</a></li>
<li><a href='//ardamis.com/' title='1' class='page'>1</a></li>
<li><a href='//ardamis.com/page/2/' title='2' class='page'>2</a></li>
<li><span class='page current'>3</span></li>
<li><a href='//ardamis.com/page/4/' title='4' class='page'>4</a></li>
<li><a href='//ardamis.com/page/5/' title='5' class='page'>5</a></li>
<li><a href='//ardamis.com/page/6/' title='6' class='page'>6</a></li>
<li><a href='//ardamis.com/page/7/' title='7' class='page'>7</a></li>
<li><span class='gap'>...</span></li>
<li><a href='//ardamis.com/page/17/' title='17' class='page'>17</a></li>
<li><a href="//ardamis.com/page/4/" class="next">&raquo;</a></li>
</ol>
</div>    

The bad pages had the suspicious URLs, but were otherwise identical. Other than the URLs in the navigation, there was nothing alarming about the HTML on the bad pages.

I downloaded the entire site and ran a malware scan against the files, which turned up nothing. I also did some full-text searching of the files for the usual base64 decode eval type stuff, but nothing was found. I searched through the tables in my database, but didn’t see any instances of com_google or proc or environ that I could connect to the suspicious URLs.

Google it

Google has turned up a few good links about this problem, including:

  1. http://www.exploitsdownload.com/search/com_/36 – AntiSecurity/Joomla Component Contact Us Google Map com_google Local File Inclusion Vulnerability
  2. http://forums.oscommerce.com/topic/369813-silly-hacker/ – “On a poorly-secured LAMP stack, that would read out your server’s environment variables. That is one step in a process that would grant the hacker root access to your box. Be thankful it’s not working. Hacker is a bad term for this. This is more on the Script Kiddie level.”

    The poster also provided a few lines of code for blocking these URLs in an .htaccess file.

    # Block another hacker
    RewriteCond %{QUERY_STRING} ^(.*)/self/(.*)$ [NC]
    RewriteRule ^.* - [F]
    
  3. http://forums.oscommerce.com/topic/369813-silly-hacker/ – “This was trying for Local File Inclusion vulnerabilities via the Joomla/Mambo script.”
  4. http://core.trac.wordpress.org/ticket/14556 – a bug ticket submitted to WordPress over a year earlier identifying a security hole if the function that generates the pagination isn’t wrapped in a url_esc function that sanitizes the URL. WP-Paginate’s author submits a comment to the thread, and the plugin does use url_esc.

So, what would evidence of an old Joomla exploit be doing on my WordPress site? And what is happening within the WP-Paginate plugin to cause these parameters to appear?

Plugins

It seemed prudent to take a closer look at two of the plugins used on the site.

Ardamis uses the WP-Paginate plugin. The business of generating the /page/2/, /page/3/ URLs is a native WordPress function, so it’s strange to see how those URLs become subject to some sort of injection by way of the WP-Paginate plugin. I tried passing a nonsense parameter in a URL (//ardamis.com/page/3/?foobar) and confirmed that the navigation links created by WP-Paginate contained that ?foobar parameter within each link. This happens on category pages, too. This behavior of adding any parameters passed in the URL to the links it is writing into the page, even if they are urlencoded, is certainly unsettling.

The site also uses the WP Super Cache plugin. While this plugin seems to have been acting up lately, in that it’s not reliably preloading the cache, I can’t make a connection between it and the problem. I also downloaded the cache folder and didn’t see cached copies of these URLs. I turned off caching in WP Super Cache but left the plugin activated, cleared the cache, and then sent the spider against the site again. This time, the URL list didn’t contain any of the bad URLs. Otherwise, the lists were identical. I re-enabled the plugin, attempted to preload the cache (it got through about 70 pages and then stopped), and then ran a few spiders against the site to finish up the preloading. I generated another URL list and the bad URLs didn’t appear in it, either.

A simple fix for the WP-Paginate behavior

The unwanted behavior of the WP-Paginate plugin can be corrected by changing a few lines of code to strip off the GET parameters from the URL. The lines to be changed all reference the function get_pagenum_link. I’m wrapping that function in the string tokenizing function strtok to strip the question mark and everything that follows.

The relevant snippets of the plugin are below.

			
$prevlink = ($this->type === 'posts')
? esc_url(strtok(get_pagenum_link($page - 1), '?'))
: get_comments_pagenum_link($page - 1);
$nextlink = ($this->type === 'posts')
? esc_url(strtok(get_pagenum_link($page + 1), '?'))
: get_comments_pagenum_link($page + 1);
			
function paginate_loop($start, $max, $page = 0) {
    $output = "";
    for ($i = $start; $i <= $max; $i++) {
        $p = ($this->type === 'posts') ? esc_url(strtok(get_pagenum_link($i), '?')) : get_comments_pagenum_link($i);
        $output .= ($page == intval($i))
        ? "<li><span class='page current'>$i</span></li>"
        : "<li><a href='$p' title='$i' class='page'>$i</a></li>";
    }
    return $output;
}

Once these changes are made, WP-Paginate will no longer insert any passed GET parameters into the links it’s writing into that page.

Bandaid

The change to the WP-Paginate plugin is what we tend to call a bandaid – it doesn’t fix the problem, it just suppresses the symptom.

I’ve found that once the site picks up the bad URLs, they can be temporarily cleaned by clearing the cache and then using a spider to recreate it. The only thing left to do is determine where they are coming from in the first place.

The facts

Let’s pause to review the facts.

  1. The http://www.xml-sitemaps.com spider sent against //ardamis.com discovers pages with odd parameters that shouldn’t be naturally occurring on the pages
  2. The behavior of the WP-Paginate plugin is to accept any parameters passed and tack them onto the URLs it is generating
  3. Deleting the cached pages created by WP Super Cache and respidering produces a clean list – the bad URLs are absent

So how is the spider finding pages with these bad URLs? How are they first getting added to a page on the site? It would seem likely that they are originating only on the home page, and the absence of the parameters on other pages that use pagination seems to support that theory.

An unsatisfying ending

Well, the day is over. I’ve added my updated WP-Paginate plugin to the site, so hopefully Ardamis has seen the last of the problem, but I’m deeply unsatisfied that I haven’t been able to get to the root cause. I’ve scoured the site and the database, and I can’t find any evidence of the URLs anywhere. If the bad URLs come back again, I’ll not be so quick to clean up the damage, and will instead try to preserve it long enough to make a determination as to their origin.

Update 07 April 2012: It’s happened again. When I spider the site, two pages have the com_google URL. These page have the code appended to the end of the URL created by the WordPress function cancel_comment_reply_link(). This function generates the anchor link in the comments area with an ID of cancel-comment-reply-link. This time, though, I see the hijacked URL used in the link even when I visit the clean URL of the page.

This code is somehow getting onto the site in such a way that it only shows up in the WP Super Cache’d pages. Clearing the cache and revisiting the page returns a clean page. My suspicion is that someone is visiting my pages with the com_google code as part of the URL. WordPress puts the code into a self-referencing link in the comment area. WP Super Cache then updates the cache with this page. I don’t think WordPress can help but work this way with nested comments, but WP Super Cache should know better than to create a cached page from anything but the content from the server.

In the end, because I wasn’t using nested comments to begin with, I chose to remove the block of code that was inserting the link from my theme’s comments.php file.

    <div class="cancel_comment_reply">
        <small><?php cancel_comment_reply_link(); ?></small>
    </div>

I expect that this will be the last time I find this type of exploit on ardamis.com, as I don’t think there is any other mechanism that will echo out on the page the contents of a parameter passed in the URL.

I just picked up an old Dell Precision 690 workstation, which I intend to develop into a file server, a Windows IIS server, and an Ubuntu LAMP server. This monster was built in 2006, but it still has some neat specs and tons of capacity (7 PCI slots, 4 hard drive bays, etc…), should I want to expand further.

Dell Precision 690

Dell Precision 690 Workstation

The main specs

CPU: Dual Core Intel Xeon 5060 3.2GHz, 4M Cache, 1066 MHz FSB
RAM: 2GB DDR2 PC2-5300, CL=5, Fully Buffered, ECC, DDR2-667
HD: SAS Fujitsu MAX3073RC 73GB, 15000 RPM, 16MB Cache
Video: Nvidia Quadro NVS 285 PCI-Express, 128MB

This is not a normal tower

Right away, the size of this thing suggests it isn’t a normal tower. It’s about up to my knee and weights 70 lbs. It feels like it’s made with heavier gauge steel than the typical chassis, but that may be me projecting.

I immediately shopped around for more RAM, obviously. 2GB seems a little thin, even by 2006 standards, when considering the way everything else is high-end. The mainboard has 8 slots and supports up to 32GB, but I figure 6GB is a safe place to start.

The workstation has three enormous fans, like, big-as-your-hand big. Running it with the chassis open causes some sort of thermal protection system to kick in and it spins the fans up to the point that they were blowing stuff on the floor half-way across the room.

The CPU has a big, passive heat sink with six copper pipes and sits between two of those fans. I’m tempted to buy a second CPU, but I’ll hold off.

I’m still on the fence about the SCSI drive. It should be super fast, but I’m a little spoiled by the SSD in my machine at work, so it’s hard to get excited about a mechanical drive, even one running at 15k RPM.

The Nvidia Quadro card is also fanless, and has a bizarre DMS-59 connector. An adapter converts the DMS-59 connector into two DVI outputs.

For a recent project, I needed to create a form that would perform a look up of people names in a MySQL database, but I wanted to use a single input field. To make it easy on the users of the form, I wanted the input field to accept names in either “Firstname Lastname” or “Lastname, Firstname” format, and I wanted it to autocomplete matches as the users typed, including when they typed both names separated by a space or a comma followed by a space.

The Ajax lookup was quick work with jQuery UI’s Autocomplete widget. The harder part was figuring out the most simple table structure and an appropriate SQL query.

A flawed beginning

My people table contains a “first_name” column and a “last_name” column, nothing uncommon there. To get the project out the door, I wrote a PHP function that ran two ALTER TABLE queries on the people table to create two additional columns for pre-formatted strings (column “firstlast”, to be formatted as “Firstname Lastname”, and column “lastfirst”, to be formatted as “Lastname, Firstname”), added indexes on these columns, and then walked through each record in the table, populating these new fields. I then wrote a very straight forward SQL query to perform a lookup on both fields. The PHP and query looked something like this:

// The jQuery UI Autocomplete widget passes the user input as a value for the parameter "name"
$name= $_GET['name'];

// This SQL query uses argument swapping
$query = sprintf("SELECT * FROM people WHERE (`firstlast` LIKE '%1\$s' OR `lastfirst` LIKE '%1\$s') ORDER BY `lastfirst` ASC",
mysql_real_escape_string($name. "%", $link));

This was effective, accurate, and pretty fast, but the addition of columns bothered me and I didn’t like that I needed to run a process to generate those pre-formatted fields each time a record was added to the table (or if a change was made to an existing record). One possible alternative was to watch the input and match either lastname or firstname until the user entered a comma or a space, then explode the string on the comma or space and search more precisely. Once a comma or a space was encountered, I felt pretty sure that I would be able to accurately determine which part of the input was the first name and which was the last name. But this had that same inefficient, clunky bad-code-smell as the extra columns. (Explode is one of those functions that I try to avoid using.) Writing lots of extra PHP didn’t seem necessary or right.

I’m much more comfortable with PHP than with MySQL queries, but I realize that one can do some amazing things within the SQL query, and that it’s probably faster to use SQL to perform some functions. So, I decided that I’d try to work up a query that solved my problem, rather than write more lines of PHP.

CONCAT_WS to the rescue

I Googled around for a bit and settled on using CONCAT_WS to concatenate the first names and last names into a single string be matched, but found it a bit confusing to work with. I kept trying to use it to create an alias, “lastfirst”, and then use the alias in the WHERE clause, which doesn’t work, or I was getting the literal column names back instead of the values. Eventually, I hit upon the correct usage.

The PHP and query now looks like this:

// The jQuery UI Autocomplete widget passes the user input as a value for the parameter "name"
$name= $_GET['name'];

// This SQL query uses argument swapping
$query = sprintf("SELECT *, CONCAT_WS(  ', ',  `last_name`,  `first_name` ) as lastfirst FROM people WHERE (CONCAT_WS(  ', ',  `last_name`,  `first_name` ) LIKE '%1\$s' OR CONCAT_WS(  ' ',  `first_name`,  `last_name` ) LIKE '%1\$s') ORDER BY lastfirst ASC",
mysql_real_escape_string($name. "%", $link));

The first instance of CONCAT_WS isn’t needed for the lookup. The first instance allows me to order the results alphabetically and provides me an array key of “lastfirst” with a value of the person’s name already formatted as “Lastname, Firstname”, so I don’t have to do it later with PHP. The lookup comes from the two instances of CONCAT_WS in the WHERE clause. I haven’t done any performance measuring here, but the results of the lookup get back to the user plenty fast enough, if not just as quickly as the method using dedicated columns.

The result of the query is output back to the page as JSON-formatted data for use in the jQuery Autocomplete.

The end result works exactly as I had hoped. A user of the form is able to type a person’s name in whatever way is comfortable to them, as “Bob Smith” or “Smith, Bob”, and the matches are found either way. The only thing it doesn’t do is output the matches back to the autocompleter in the same format that the user is using. But I can live with that for now.

While setting up a new MySQL account at a GoDaddy hosted web site, I kept getting an error when logging in to phpMyAdmin.

Error
#1045 – Access denied for user

For things like database usernames/passwords and other things that I’ll never have to remember or type, I like to use a long string of random characters. One excellent source of such strings is GRC’s Ultra High Security Password Generator. I typically use a subset of the 63 random alpha-numeric characters (a-z, A-Z, 0-9) in the bottom box. This gives me a good mix of uppercase, lowercase, and numbers, which satisfies the requirements of most password systems that require even minimum complexity.

So, I picked a string of characters for the database name and a different string for the password (making sure the password contained at least 1 uppercase character and 1 number), pasted them into the config.php file I was going to use on the project and then pasted them into the database setup form and created my database. No problem.

I gave it 10 or 15 minutes to get all set up and then launched phpMyAdmin. I copied and pasted the username and password from my config file into the log in fields and wham, I got the #1045.

After much second guessing and more copying and pasting, all with no luck, I tried resetting the password back in the Hosting Control Center. I waited a few more minutes for good measure and tried again. Still, #1045 – Access denied for user.

Then it was time to Google, which turned up a thread full of people with the same experience at http://community.godaddy.com/groups/web-hosting/forum/topic/mysql-login-error-1045-access-denied-for-user/?sid&sp=1&topic_page=1&num=15.

Back in the Control Center, I noticed that the mixed case characters I’d used for the database/username had been converted to lowercase. So I tried using the lowercase version at phpMyAdmin and still no luck.

I submitted a support ticket, as recommended in the thread, and then called Customer Support for good measure.

The guy confirmed that the database was in good shape and that the last password reset took effect, then had me reset it again. And of course, when I tried to log into phpMyAdmin a moment later with the lowercase username, it went right in.

The fix (or a plausible explanation, at least)

The lesson learned here, is that even though the new MySQL database setup form will accept mixed case characters as the database name/username, it will silently convert them to lowercase on you. The phpMyAdmin login, then, is case sensitive, so you may want to copy and paste from the Control Center into phpMyAdmin to be sure you’re feeding it the correct username.

If there is one thing that bothers me about Dreamweaver’s default settings (other than its annoying habit of rewriting valid code as camelCase), it is that double-clicking .php and .asp files launches Dreamweaver in Design view. Who on earth thinks that people generally want to open these kinds of files in anything other than code view?

Well, thankfully, this ridiculous behavior can be changed with a quick registry tweak. Just add an extension to the “Open As Text” value to cause Dreamweaver to always open that file type in code view. The .reg file below assumes you’re running Dreamweaver CS5.

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Software\Adobe\Dreamweaver CS5\Helper Applications Preferences]
"Open As Text"=".js .asa .css .cs .config .inc .txt .as .asc .asr .vb .htaccess .htpasswd .php .asp .html"

It would be nice if Dreamweaver were smart enough to realize that, if your preferred layout is Coder, any file should be opened in Code view.

The link below explains how to do the same thing via Edit | Preferences.

Source: Open files in Code view by default

Update 2015-01-02: About a month ago, in early December, 2014, Google announced that it was working on a new anti-spam API that is intended to replace the traditional CAPTCHA challenge as a method for humans to prove that they are not robots. This is very good news.
This week, I noticed that Akismet is adding a hidden input field to the comment form that contains a timestamp (although the plugin’s PHP puts the initial INPUT element within a P element set to DISPLAY:NONE, when the plugin’s JavaScript updates the value with the current timestamp, the INPUT element jumps outside of that P element). The injected code looks something like this:
<input type=”hidden” id=”ak_js” name=”ak_js” value=”1420256728989″>
I haven’t yet dug into the Akismet code to discover what it’s doing with the timestamp, but I’d be pleased if Akismet is attempting to differentiate humans from bots based on behavior.
Update 2015-01-10: To test the effectiveness of the current version of Akismet, I disabled the anti-spam plugin described in this post on 1/2/2015 and re-enabled it on 1/10/2015. In the span of 8 days, Akismet identified 1,153 spam comments and missed 15 more. These latest numbers continue to support my position that Akismet is not enough to stop spam comments.

In the endless battle against WordPress comment spam, I’ve developed and then refined a few different methods for preventing spam from getting to the database to begin with. My philosophy has always been that a human visitor and a spam bot behave differently (after all, the bots we’re dealing with are not Nexus-6 model androids here), and an effective spam-prevention method should be able to recognize the differences. I also have a dislike for CAPTCHA methods that require a human visitor to prove, via an intentionally difficult test, that they aren’t a bot. The ideal method, I feel, would be invisible to a human visitor, but still accurately identify comments submitted by bots.

Spam on ardamis.com in early 2012 - before and after

Spam on ardamis.com - before and after

A brief history of spam fighting

The most successful and simple method I found was a server-side system for reducing comment spam by using a handshake method involving timestamps on hidden form fields that I implemented in 2007. The general idea was that a bot would submit a comment more quickly than a human visitor, so if the comment was submitted too soon after the post page was loaded, the comment was rejected. A human caught in this trap would be able to click the Back button on the browser, wait a few seconds, and resubmit. This proved to be very effective on ardamis.com, cutting the number of spam comments intercepted by Akismet per day to nearly zero. For a long time, the only problem was that it required modifying a core WordPress file: wp-comments-post.php. Each time WordPress was updated, the core file was replaced. If I didn’t then go back and make my modifications again, I would lose the spam protection until I made the changes. As it became easier to update WordPress (via a single click in the admin panel) and I updated it more frequently, editing the core file became more of a nuisance.

A huge facepalm

When Google began weighting page load times as part of its ranking algorithm, I implemented the WP Super Cache caching plugin on ardamis.com and configured it to use .htaccess and mod_rewrite to serve cache files. Page load times certainly decreased, but the amount of spam detected by Akismet increased. After a while, I realized that this was because the spam bots were submitting comments from static, cached pages, and the timestamps on those pages, which had been generated server-side with PHP, were already minutes old when the page was requested. The form processing script, which normally rejects comments that are submitted too quickly to be written by a human visitor, happily accepted the timestamps. Even worse, a second function of my anti-spam method also rejected comments that were submitted 10 minutes or more after the page was loaded. Of course, most of the visitors were being served cached pages that were already more than 10 minutes old, so even legitimate comments were being rejected. Using PHP to generate my timestamps obviously was not going to work if I wanted to keep serving cached pages.

JavaScript to the rescue

Generating real-time timestamps on cached pages requires JavaScript. But instead of a reliable server clock setting the timestamp, the time is coming from the visitor’s system, which can’t be trusted to be accurate. Merely changing the comment form to use JavaScript to generate the first timestamp wouldn’t work, because verifying a timestamp generated on the client-side against one generated server-side would be disastrous.

Replacing the PHP-generated timestamps with JavaScript-generated timestamps would require substantial changes to the system.

Traditional client-side form validation using JavaScript happens when the form is submitted. If the validation fails, the form is not submitted, and the visitor typically gets an alert with suggestions on how to make the form acceptable. If the validation passes, the form submission continues without bothering the visitor. To get our two timestamps, we can generate a first timestamp when the page loads and compare it to a second timestamp generated when the form is submitted. If the visitor submits the form too quickly, we can display an alert showing the number of seconds remaining until the form can be successfully submitted. This client-side validation should hopefully be invisible to most visitors who choose to leave comments, but at the very least, far less irritating than a CAPTCHA system.

It took me two tries to get it right, but I’m going to discuss the less successful method first to point out its flaws.

Method One (not good enough)

Here’s how the original system flowed.

  1. Generate a first JS timestamp when the page is loaded.
  2. Generate a second JS timestamp when the form is submitted.
  3. Before the form contents are sent to the server, compare the two timestamps, and if enough time has passed, write a pre-determined passcode to a hidden INPUT element, then submit the form.
  4. After the form contents are sent to the server, use server-side logic to verify that the passcode is present and valid.

The problem was that it seemed that certain bots could parse JavaScript enough to drop the pre-determined passcode into the hidden form field before submitting the form, circumventing the timestamps completely and defeating the system.

Because the timestamps were only compared on the client-side, it also failed to adhere to one of the basic tenants of form validation – that the input must be checked on both the client-side and the server-side.

Method Two (better)

Rather than having the server-side validation be merely a check to confirm that the passcode is present, method two compares the timestamps a second time on the server side. Instead of a single hidden input, we now have two – one for each timestamp. This is intended to prevent a bot from figuring out the ultimate validation mechanism by simply parsing the JavaScript. Finally, the hidden fields are not in the HTML of the page when it’s sent to the browser, but are added to the form via jQuery, which makes it easier to implement and may act as another layer of obfuscation.

  1. Generate a first JS timestamp when the page is loaded and write it to a hidden form field.
  2. Generate a second JS timestamp when the form is submitted and write it to a hidden form field.
  3. Before the form contents are sent to the server, compare the two timestamps, and if enough time has passed, submit the form (client-side validation).
  4. On the form processing page, use server-side logic to compare the timestamps a second time (server-side validation).

This timestamp handshake works more like it did in the proven-effective server-side-only method. We still have to pass something from the comment form to the processing script, but it’s not too obvious from the HTML what is being done with it. Furthermore, even if a bot suspects that the timestamps are being compared, there is no telling from the HTML what the threshold is for distinguishing a valid comment from one that is invalid. (The JavaScript could be parsed by a bot, but the server-side check cannot be, making it possible to require a slightly longer amount of time to elapse in order to pass the server-side check.)

The same downside plagued me

For a long time, far longer than I care to admit, I stubbornly continued to modify the core file wp-comments-post.php to provide the server-side processing. But creating the timestamps and parsing them with a plug-in turned out to be a simple matter of two functions, and in June of 2013 I finally got around to doing it the right way.

The code

The plugin, in all its simplicity, is only 100 lines. Just copy this code into a text editor, save it as a .php file (the name isn’t important) and upload it to the /wp-content/plugins directory and activate it. Feel free to edit it however you like to suit your needs.

<?php

/*
Plugin Name: Timestamp Comment Filter
Plugin URI: //ardamis.com/2011/08/27/a-cache-proof-method-for-reducing-comment-spam/
Description: This plugin measures the amount of time between when the post page loads and the comment is submitted, then rejects any comment that was submitted faster than a human probably would or could.
Version: 0.1
Author: Oliver Baty
Author URI: //ardamis.com

    Copyright 2013  Oliver Baty  (email : obbaty@gmail.com)

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
*/

// http://wordpress.stackexchange.com/questions/6723/how-to-add-a-policy-text-just-before-the-comments
function ard_add_javascript(){

	?>
	
<script type="text/javascript" src="//ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){
    ardGenTS1();
});
 
function ardGenTS1() {
    // prepare the form
    $('#commentform').append('<input type="hidden" name="ardTS1" id="ardTS1" value="1" />');
    $('#commentform').append('<input type="hidden" name="ardTS2" id="ardTS2" value="1" />');
    $('#commentform').attr('onsubmit', 'return validate()');
    // set a first timestamp when the page loads
    var ardTS1 = (new Date).getTime();
    document.getElementById("ardTS1").value = ardTS1;
}
 
function validate() {
    // read the first timestamp
    var ardTS1 = document.getElementById("ardTS1").value;
//  alert ('ardTS1: ' + ardTS1);
    // generate the second timestamp
    var ardTS2 = (new Date).getTime();
    document.getElementById("ardTS2").value = ardTS2;
//  alert ('ardTS2: ' + document.getElementById("ardTS2").value);
    // find the difference
    var diff = ardTS2 - ardTS1;
    var elapsed = Math.round(diff / 1000);
    var remaining = 10 - elapsed;
//  alert ('diff: ' + diff + '\n\n elapsed:' + elapsed);
    // check whether enough time has elapsed
    if (diff > 10000) {
        // submit the form
        return true;
    }else{
        // display an alert if the form is submitted within 10 seconds
        alert("This site is protected by an anti-spam feature that requires 10 seconds to have elapsed between the page load and the form submission. \n\n Please close this alert window.  The form may be resubmitted successfully in " + remaining + " seconds.");
        // prevent the form from being submitted
        return false;
    }
}
</script>
	
	<?php
}

add_action('comment_form_before','ard_add_javascript');

// http://wordpress.stackexchange.com/questions/89236/disable-wordpress-comments-api
function ard_parse_timestamps(){

	// Set up the elapsed time, in miliseconds, that is the threshold for determining whether a comment was submitted by a human
	$intThreshold = 10000;
	
	// Set up a message to be displayed if the comment is blocked
	$strMessage = '<strong>ERROR</strong>:  this site uses JavaScript validation to reduce comment spam by rejecting comments that appear to be submitted by an automated method.  Either your browser has JavaScript disabled or the comment appeared to be submitted by a bot.';
	
	$ardTS1 = ( isset($_POST['ardTS1']) ) ? trim($_POST['ardTS1']) : 1;
	$ardTS2 = ( isset($_POST['ardTS2']) ) ? trim($_POST['ardTS2']) : 2;
	$ardTS = $ardTS2 - $ardTS1;
	 
	if ( $ardTS < $intThreshold ) {
	// If the difference of the timestamps is not more than 10 seconds, exit
		wp_die( __($strMessage) );
	}
}
add_action('pre_comment_on_post', 'ard_parse_timestamps');

?>

That’s it. Not so bad, right?

Final thoughts

The screen-shot at the beginning of the post shows the number of spam comments submitted to ardamis.com and detected by Akismet each day from the end of January, 2012, to the beginning of March, 2012. The dramatic drop-off around Jan 20 was when I implemented the method described in this post. The flare-up around Feb 20 was when I updated WordPress and forgot to replace the modified core file for about a week, illustrating one of the hazards of changing core files.

If you would rather not add any hidden form fields to the comment form, you could consider appending the two timestamps to the end of the comment_post_ID field. Because its contents are cast as an integer in wp-comments-post.php when value of the $comment_post_ID variable is set, WordPress won’t be bothered by the extra data at the end of the field, so long as the post ID comes first and is followed by a space. You could then just explode the contents of the comment_post_ID field on the space character, then compare the last two elements of the array.

If you don’t object to meddling with a core file in order to obtain a little extra protection, you can rename the wp-comments-post.php file and change the path in the comment form’s action attribute. I’ve posted logs showing that some bots just try to post spam directly to the wp-comments-post.php file, so renaming that file is an easy way to cut down on spam. Just remember to come back and delete the wp-comments-post.php file each time you update WordPress.

Overview of classes and objects

Objects are the building blocks of the application (ie: the workers in a factory)
Classes can be thought of as blueprints for the objects. Classes describe the objects, which are created in memory.
So, the programmer writes the classes and the PHP interpreter creates the objects from the classes.

A class may contain both variables and functions.
A variable inside a class is called a property.
A function inside a class is called a method.

Instantiation

To create an object, you instantiate a class (you create an instance of the class as an object).
For example, if we have a class named ‘person’ and want to instantiate it as the variable $oliver:

$oliver = new person();

The variable $oliver is referred to as the ‘handle’.

Accessing properties and methods

To access the properties and methods of a class, we use the object’s handle, followed by the arrow operator “->”.
For example, if our class has a method ‘get_name’, we can echo that to the page with:

echo $oliver->get_name();

Note that there are no single or double quotes used in instantiating a class or accessing properties and methods of a class.

Constructors

A class may have a special method called a constructor. The constructor method is called automatically when the object is instantiated.
The constructor method begins with two underscores and the word ‘construct’:

function __construct($variable) { }

One can pass values to the constructor method by providing arguments after the class name.
For example, to pass the name “John Doe” to the constructor method in the ‘person’ class:

$john = new person("John Doe");

! If a constructor exists and expects arguments, you must instantiate the class with the arguments expected by the constructor.

Access modifiers and visibility declarations

Properties must, and methods may, have one of three access modifiers (visibility declarations): public, protected, and private.
Public: can be accessed from outside the class, eg: $myclass->secret_variable;
Protected: can be accessed within the class and by classes derived from the class
Private: can be accessed only within the class

Declaring a property with var makes the property public.

Methods declared without an explicit access modifier are considered public.

! If you call a protected method from outside the class, any PHP output before the call is still processed, but you get an error message when the interpreter gets to that call:

Fatal error: Call to protected method...

Inheritance

Inheritance allows a child class to be created from a parent class, whereby the child has all of the public and protected properties and methods of the parent.

A child class extends a parent class:

class employee extends person {
}

A child class can redefine/override/replace a method in the parent class by reusing the method name.

! A child class’s method’s access modifier can not be more restrictive than that of the parent class. For example, if the parent class has a public set_name() method and the child class’s set_name() method is protected, the class itself will generate a fatal error, and no prior PHP output will be rendered. (In the error below, employee is the child class to person):

Fatal error: Access level to employee::set_name() must be public (as in class person) in E:xampphtdocstesteroopclass_lib.php on line 38

To differentiate between a method in a parent class vs the method as redefined in a child class, one must specifically name the class that contains the method you want to call using the scope resolution operator (::):

person::set_name($new_name);

The scope resolution operator allows access to static, constant, and overridden properties or methods of a class, generally, a parent class. This would be done inside the child class, after redefining a parent’s method of the same name.

It’s also possible to use ‘parent’ to refer to the child’s parent class:

parent::set_name($new_name);

(I’m still a bit vague on this and am looking for examples of situations in which this would be used.)

Classes inside classes

Just as it’s possible to instantiate a class and use the object in a view file, it’s possible to instantiate an object and call its methods from inside another class.

Static properties and methods

Declaring class properties or methods as static makes them accessible without needing an instantiation of the class. A property declared as static can not be accessed with an instantiated class object (though a static method can).

Resources
http://us2.php.net/manual/en/language.oop5.php
http://net.tutsplus.com/tutorials/php/oop-in-php/
http://www.phpfreaks.com/tutorial/oo-php-part-1-oop-in-full-effect
http://www.killerphp.com/tutorials/object-oriented-php/

I’m running XAMPP 1.7.4 [PHP: 5.3.5] (not as a service) on 64-bit Windows 7 Professional.

I installed XAMPP to E:\xampp, and I have pinned the XAMPP Control Panel (xampp-control.exe) to the taskbar for easier access, but starting up xampp-control.exe from that shortcut throws an error:

XAMPP Control

XAMPP Component Status Check failure [3].

Current directory: E:\xampp

Run this program only from your XAMPP root directory.

[OK] [Cancel]

Strangely enough, I even get this error even when running xampp-control.exe from my XAMPP root directory, which really is E:\xampp.

The last post in the thread at http://www.apachefriends.org/f/viewtopic.php?f=16&t=44320&sid=a41029c6a36bbf5b3bb5817f37842340&start=60 offers a simple solution: change the Install_Dir value under HKEY_LOCAL_MACHINE to point to C:\xampp. According to the thread, the error message is due to a bug where the Install_Dir is checked against a hard-coded path on C:\. That may or may not be the case, but the suggested work-around seems to be effective.

Here’s a registry merge for Windows 7 64-bit that will make the change for you.

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\xampp]
"Install_Dir"="C:\\xampp"

Now xampp-control.exe launches without the error, and I haven’t noticed anything (PHP, MySQL, etc.) not working because of the bogus path.

Q: Just how quickly does Googlebot visit a page after it is linked to from a post on blogspot.com?
A: Pretty darn quickly.

I am curious about just what data is being passed to pages by user-agents. To try to gather some of this data, I created a test page at Aleph Studios. This page records the keys and values of the $_GET, $_POST, and $_COOKIE arrays, along with the values that belong to the keys starting with HTTP_ from the $_SERVER array, and the value of $_SERVER[‘REQUEST_TIME’]. It bundles all this up into an email and sends it to me each time the page is accessed.

In order to get this page into Google, I posted a link to it on my throwaway blog at Aleph Studios on Blogger. The timestamp of that post is 9:13 pm. The email from the page as triggered by Googlebot is timestamped 9:14 pm. That’s pretty slick.

The HTTP headers sent by Googlebot and recorded by the page are:

HTTP_ACCEPT = */*
HTTP_ACCEPT_ENCODING = gzip,deflate
HTTP_CONNECTION = Keep-alive
HTTP_FROM = googlebot(at)googlebot.com
HTTP_HOST = alephstudios.com
HTTP_USER_AGENT = Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

The page’s filename, title tag, and H1 tag are all “WvVdWfwgMcmypDqv7t”, so it’ll be easy to find in Google later. As I’ve observed in the past, within 20 minutes, the page on blogspot.com containing the string shows up in Google for a search on the string.

New pages don’t really take twenty minutes to show up in Google, but I don’t check Google very quickly after hitting the Publish button. The fastest I’ve personally witnessed a new page on ardamis.com appearing in Google was under 4 minutes.