Tag Archives: plugin

Akismet is not enough to stop spam comments

I like Akismet, and it’s undeniably effective in stopping the vast majority of spam, but it adds a huge number of comments to the database and a very small percentage of comments still get through to my moderation queue.

It’s annoying to find comments in my moderation queue, but what I really object to is the thousands of records that are added to the database each month that I don’t see.

In the screenshot below, January through April show very few spam comments being detected by Akismet. This is because I was using my cache-friendly method for reducing WordPress comment spam to block spam comments even before Akismet analyzed them.

In May, I moved hosting providers to asmallorange.com and started with a fresh install of WordPress without implementing my custom spam method, which admittedly was not ideal because it involved changing core files. This left only Akismet between the spammers and my WordPress database. Since that time, instead of 150 or fewer spam comments per month making it into my WordPress database, Akismet was on pace to let in over 10,000.

So, in the spirit of fresh starts and doing things the right way, I created a WordPress plug-in that uses the same timestamp method. It’s actually exactly the same JavaScript and PHP code, just in plug-in form, so it’s not bound to any core files or theme files.

What is this Joomla exploit doing on my WordPress site?

7:21 PM 2/26/2012

I recently ran the spider at www.xml-sitemaps.com against www.ardamis.com and it returned a list of URLs that included a few pages with some suspicious-looking parameters. This is the second time I’ve come across these URLs, so I decided to document what was going on. The first time, I just cleared the cache, spidered the site to preload the cache, and confirmed that the spider didn’t encounter the pages. And then I forgot all about it. But now I’m mad.

Normally, a URL list for a WordPress site includes the various pages of the site, like so:

//ardamis.com/
//ardamis.com/page/2/
//ardamis.com/page/3/

But in the suspicious URL list, there are additional URLs for the pages directly off of the site’s root.

//ardamis.com/
//ardamis.com/?option=com_google&controller=..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F%2Fproc%2Fself%2Fenviron%0000
//ardamis.com/page/2/
//ardamis.com/page/2/?option=com_google&controller=..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F%2Fproc%2Fself%2Fenviron%0000
//ardamis.com/page/3/
//ardamis.com/page/3/?option=com_google&controller=..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F..%2F%2F%2Fproc%2Fself%2Fenviron%0000

This occurs only for the pagination of the main site’s pages. I did not find URLs containing the parameter ?option=com_google&controller= for any pages that exist under a category or tag, but that also use the /page/2/ convention.

The parameter is the urlencoded version of the text:

?option=com_google&controller=..//..//..//..//..//..//..//..///proc/self/environ00

Exploration

I compared the source code of the pages at the clean URLs vs that of the pages at the bad URLs and found that there was a difference in the pagination code generated by the WP-Paginate plugin.

The good pages had normal-looking pagination links.

<div class="navigation">
<ol class="wp-paginate">
<li><span class="title">Navigation:</span></li>
<li><a href="//ardamis.com/page/2/" class="prev">&laquo;</a></li>
<li><a href='//ardamis.com/' title='1' class='page'>1</a></li>
<li><a href='//ardamis.com/page/2/' title='2' class='page'>2</a></li>
<li><span class='page current'>3</span></li>
<li><a href='//ardamis.com/page/4/' title='4' class='page'>4</a></li>
<li><a href='//ardamis.com/page/5/' title='5' class='page'>5</a></li>
<li><a href='//ardamis.com/page/6/' title='6' class='page'>6</a></li>
<li><a href='//ardamis.com/page/7/' title='7' class='page'>7</a></li>
<li><span class='gap'>...</span></li>
<li><a href='//ardamis.com/page/17/' title='17' class='page'>17</a></li>
<li><a href="//ardamis.com/page/4/" class="next">&raquo;</a></li>
</ol>
</div>

The bad pages had the suspicious URLs, but were otherwise identical. Other than the URLs in the navigation, there was nothing alarming about the HTML on the bad pages.

I downloaded the entire site and ran a malware scan against the files, which turned up nothing. I also did some full-text searching of the files for the usual base64 decode eval type stuff, but nothing was found. I searched through the tables in my database, but didn’t see any instances of com_google or proc or environ that I could connect to the suspicious URLs.

Google it

Google has turned up a few good links about this problem, including:

http://www.exploitsdownload.com/search/com_/36 – AntiSecurity/Joomla Component Contact Us Google Map com_google Local File Inclusion Vulnerability
http://forums.oscommerce.com/topic/369813-silly-hacker/ – “On a poorly-secured LAMP stack, that would read out your server’s environment variables. That is one step in a process that would grant the hacker root access to your box. Be thankful it’s not working. Hacker is a bad term for this. This is more on the Script Kiddie level.”
The poster also provided a few lines of code for blocking these URLs in an .htaccess file.
```
# Block another hacker
RewriteCond %{QUERY_STRING} ^(.*)/self/(.*)$ [NC]
RewriteRule ^.* - [F]
```
http://forums.oscommerce.com/topic/369813-silly-hacker/ – “This was trying for Local File Inclusion vulnerabilities via the Joomla/Mambo script.”
http://core.trac.wordpress.org/ticket/14556 – a bug ticket submitted to WordPress over a year earlier identifying a security hole if the function that generates the pagination isn’t wrapped in a url_esc function that sanitizes the URL. WP-Paginate’s author submits a comment to the thread, and the plugin does use url_esc.

So, what would evidence of an old Joomla exploit be doing on my WordPress site? And what is happening within the WP-Paginate plugin to cause these parameters to appear?

Plugins

It seemed prudent to take a closer look at two of the plugins used on the site.

Ardamis uses the WP-Paginate plugin. The business of generating the /page/2/, /page/3/ URLs is a native WordPress function, so it’s strange to see how those URLs become subject to some sort of injection by way of the WP-Paginate plugin. I tried passing a nonsense parameter in a URL (//ardamis.com/page/3/?foobar) and confirmed that the navigation links created by WP-Paginate contained that ?foobar parameter within each link. This happens on category pages, too. This behavior of adding any parameters passed in the URL to the links it is writing into the page, even if they are urlencoded, is certainly unsettling.

The site also uses the WP Super Cache plugin. While this plugin seems to have been acting up lately, in that it’s not reliably preloading the cache, I can’t make a connection between it and the problem. I also downloaded the cache folder and didn’t see cached copies of these URLs. I turned off caching in WP Super Cache but left the plugin activated, cleared the cache, and then sent the spider against the site again. This time, the URL list didn’t contain any of the bad URLs. Otherwise, the lists were identical. I re-enabled the plugin, attempted to preload the cache (it got through about 70 pages and then stopped), and then ran a few spiders against the site to finish up the preloading. I generated another URL list and the bad URLs didn’t appear in it, either.

A simple fix for the WP-Paginate behavior

The unwanted behavior of the WP-Paginate plugin can be corrected by changing a few lines of code to strip off the GET parameters from the URL. The lines to be changed all reference the function get_pagenum_link. I’m wrapping that function in the string tokenizing function strtok to strip the question mark and everything that follows.

The relevant snippets of the plugin are below.

			
$prevlink = ($this->type === 'posts')
? esc_url(strtok(get_pagenum_link($page - 1), '?'))
: get_comments_pagenum_link($page - 1);
$nextlink = ($this->type === 'posts')
? esc_url(strtok(get_pagenum_link($page + 1), '?'))
: get_comments_pagenum_link($page + 1);

			
function paginate_loop($start, $max, $page = 0) {
    $output = "";
    for ($i = $start; $i <= $max; $i++) {
        $p = ($this->type === 'posts') ? esc_url(strtok(get_pagenum_link($i), '?')) : get_comments_pagenum_link($i);
        $output .= ($page == intval($i))
        ? "<li><span class='page current'>$i</span></li>"
        : "<li><a href='$p' title='$i' class='page'>$i</a></li>";
    }
    return $output;
}

Once these changes are made, WP-Paginate will no longer insert any passed GET parameters into the links it’s writing into that page.

Bandaid

The change to the WP-Paginate plugin is what we tend to call a bandaid – it doesn’t fix the problem, it just suppresses the symptom.

I’ve found that once the site picks up the bad URLs, they can be temporarily cleaned by clearing the cache and then using a spider to recreate it. The only thing left to do is determine where they are coming from in the first place.

The facts

Let’s pause to review the facts.

The http://www.xml-sitemaps.com spider sent against //ardamis.com discovers pages with odd parameters that shouldn’t be naturally occurring on the pages
The behavior of the WP-Paginate plugin is to accept any parameters passed and tack them onto the URLs it is generating
Deleting the cached pages created by WP Super Cache and respidering produces a clean list – the bad URLs are absent

So how is the spider finding pages with these bad URLs? How are they first getting added to a page on the site? It would seem likely that they are originating only on the home page, and the absence of the parameters on other pages that use pagination seems to support that theory.

An unsatisfying ending

Well, the day is over. I’ve added my updated WP-Paginate plugin to the site, so hopefully Ardamis has seen the last of the problem, but I’m deeply unsatisfied that I haven’t been able to get to the root cause. I’ve scoured the site and the database, and I can’t find any evidence of the URLs anywhere. If the bad URLs come back again, I’ll not be so quick to clean up the damage, and will instead try to preserve it long enough to make a determination as to their origin.

Update 07 April 2012: It’s happened again. When I spider the site, two pages have the com_google URL. These page have the code appended to the end of the URL created by the WordPress function cancel_comment_reply_link(). This function generates the anchor link in the comments area with an ID of cancel-comment-reply-link. This time, though, I see the hijacked URL used in the link even when I visit the clean URL of the page.

This code is somehow getting onto the site in such a way that it only shows up in the WP Super Cache’d pages. Clearing the cache and revisiting the page returns a clean page. My suspicion is that someone is visiting my pages with the com_google code as part of the URL. WordPress puts the code into a self-referencing link in the comment area. WP Super Cache then updates the cache with this page. I don’t think WordPress can help but work this way with nested comments, but WP Super Cache should know better than to create a cached page from anything but the content from the server.

In the end, because I wasn’t using nested comments to begin with, I chose to remove the block of code that was inserting the link from my theme’s comments.php file.

    <div class="cancel_comment_reply">
        <small><?php cancel_comment_reply_link(); ?></small>
    </div>

I expect that this will be the last time I find this type of exploit on ardamis.com, as I don’t think there is any other mechanism that will echo out on the page the contents of a parameter passed in the URL.

Sitelinks are back on Ardamis

As of early October, Ardamis.com has its Google sitelinks back. I first noticed them back in July of 2009, when Ardamis had a toolbar PageRank of 6. Changes to Google’s algorithm later cost the site the sitelinks and reduced the PR to 5, which is how the site has appeared for the last year or so. Three months ago, in July of 2010 and one year after the sitelinks appeared, I noticed that all of the pages combined had over one million inbound links.

This is what a Google search for ardamis returns:

Ardamis' Google sitelinks

The second result returned, Final Fantasy XIII freezing on Xbox 360, is among my longest posts, has 91 comments, and enjoys some of the best inbound links of any page on the site, including from the forums at Xbox.com, Kotaku and GamesRadar.

The third result is my primary competition for the term ardamis, which briefly held the number one ranking a few months ago. That site has some one-line site links.

The actual mechanics of obtaining sitelinks remains a mystery, but there are plenty of people who are willing to speculate (and a few brave enough to promise they can deliver them for a price).

I’ve been posting more frequently, the site uses the WP-Paginate plugin and according to Google’s Webmaster Tools, the home page alone now has well over one million inbound links, but otherwise it’s been business as usual here.

Over one million inbound links to the home page of Ardamis.com

I’m not going to speculate about how to get sitelinks or whether one or more of the changes in the last year was the catalyst, but Google does say to use descriptive and non-repetitive anchor and alt text in a site’s internal links and to keep important pages within a few clicks of the home page. These are very basic, fundamental things that any site should do, but it bears repeating.

Fixing warnings in the WordPress Sociable plugin

This plugin is once again actively supported. Please download the latest version from http://wordpress.org/extend/plugins/sociable/.

I’ve fixed some errors that I was experiencing with version 2.0 (dated 2007-02-02) of the Sociable WordPress plugin by Peter Harkins. Specifically, when running WordPress 2.2+, I would get the following warnings when saving changes:

Warning: implode() [function.implode]: Bad arguments. in PATH\wp-content\plugins\sociable\sociable.php on line 651

Warning: Invalid argument supplied for foreach() in PATH\wp-content\plugins\sociable\sociable.php on line 762

For each button, I’d get another error:

Warning: in_array() [function.in-array]: Wrong datatype for second argument in PATH\wp-content\plugins\sociable\sociable.php on line 797

I think I’ve narrowed the cause down to the way the plugin was writing the newly selected options to the database.

In addition to fixing those warnings, I also corrected a few behaviors:

The StumbleUpon button didn’t send the URL to the correct address at http://www.stumbleupon.com/. The button now works correctly, and also sends the page’s title.
In Options -> Sociable, leaving the field for “text displayed in front of the icons” blank now results in no extraneous code being inserted into the page. Leaving the field blank also eliminates the popup “These icons link to social bookmarking sites…” text.
The links to the social networking sites are now all rel="nofollow".

A plugin for adding the post date to wp_get_archives

The WordPress function wp_get_archives(‘type=postbypost’) displays a lovely list of posts, but won’t show the date of each post. This plugin adds each post’s date to those ‘postbypost’ lists, like so:

Add dates to wp_get_archives

Usage

Upload and activate the plugin
Edit your theme, replacing wp_get_archives('type=postbypost') with if (function_exists('ard_get_archives')) ard_get_archives();

The function ard_get_archives(); replaces wp_get_archives('type=postbypost'), meaning you don’t need to specify type=postbypost. You can use all of the wp_get_archives() parameters except ‘type’ and ‘show_post_count’ (limit, format, before, and after). In addition, there’s a new parameter: show_post_date, that you can use to hide the date, but the plugin will show the date by default.

show_post_date
(boolean) Display date of posts in an archive (1 – true) or do not (0 – false). For use with ard_get_archives(). Defaults to 1 (true).

Customizing the date

By default, the plugin displays the date as “(MM/DD/YYYY)”, but you can change this to use any standard PHP date characters by editing the plugin at the line:

$arc_date = date('m/d/Y', strtotime($arcresult->post_date));  // new

The date is wrapped in tags, so you can style the date independently of the link.

How does it work?

The plugin replaces the ‘postbypost’ part of the function wp_get_archives, and adds the date to $before. The relevant code is below. You can compare it to the corresponding lines in general-template.php.

	} elseif ( ( 'postbypost' == $type ) || ('alpha' == $type) ) {
		('alpha' == $type) ? $orderby = "post_title ASC " : $orderby = "post_date DESC ";
		$arcresults = $wpdb->get_results("SELECT * FROM $wpdb->posts $join $where ORDER BY $orderby $limit");
		if ( $arcresults ) {
			$beforebefore = $before;  // new
			foreach ( $arcresults as $arcresult ) {
				if ( $arcresult->post_date != '0000-00-00 00:00:00' ) {
					$url  = get_permalink($arcresult);
					$arc_title = $arcresult->post_title;
					$arc_date = date('m/d/Y', strtotime($arcresult->post_date));  // new
					if ( $show_post_date )  // new
						$before = $beforebefore . '<span class="recentdate">' . $arc_date . '</span>';  // new
					if ( $arc_title )
						$text = strip_tags(apply_filters('the_title', $arc_title));
					else
						$text = $arcresult->ID;
					echo get_archives_link($url, $text, $format, $before, $after);
				}
			}
		}
	}

The lines ending in ‘// new’ are the only changes.

So you want the date to appear after the title? Edit the plugin to modify $after, instead:

	} elseif ( ( 'postbypost' == $type ) || ('alpha' == $type) ) {
		('alpha' == $type) ? $orderby = "post_title ASC " : $orderby = "post_date DESC ";
		$arcresults = $wpdb->get_results("SELECT * FROM $wpdb->posts $join $where ORDER BY $orderby $limit");
		if ( $arcresults ) {
			$afterafter = $after;  // new
			foreach ( $arcresults as $arcresult ) {
				if ( $arcresult->post_date != '0000-00-00 00:00:00' ) {
					$url  = get_permalink($arcresult);
					$arc_title = $arcresult->post_title;
					$arc_date = date('j F Y', strtotime($arcresult->post_date));  // new
					if ( $show_post_date )  // new
						$after = '&nbsp;(' . $arc_date . ')' . $afterafter;  // new
					if ( $arc_title )
						$text = strip_tags(apply_filters('the_title', $arc_title));
					else
						$text = $arcresult->ID;
					echo get_archives_link($url, $text, $format, $before, $after);
				}
			}
		}
	}

Download

Get the files here: (Current version: 0.1 beta)

Download the Ardamis DateMe WordPress Plugin

Apricot – A Minimalist WordPress Theme

Apricot is a text-heavy and graphic-light, widget- and tag-supporting minimalist WordPress theme built on a Kubrick foundation. Apricot validates as XHTML 1.0 Strict and uses valid CSS. It natively supports the excellent Other Posts From Cat and the_excerpt Reloaded plugins, should you want to install them.

WordPress version 2.3 introduces native support for ‘tags’, a method of organizing posts according to key words. Apricot has been updated to use this native tag system. The tag cloud will appear in the sidebar and the tags for each post appear above the meta data.

I used Apricot on this site for over a year, making little tweaks and adjustments the whole time, so the theme is pretty thoroughly tested in a variety of different browsers and resolutions. While the markup is derived from the WordPress default theme, Kubrick, I’ve added a few modifications of my own. I’ve listed some of these changes below.

header.php

Title tag reconfigured to display “Page Title | Site Name”

single.php

Post title is now wrapped in H1 tags
Metadata shows when the post was last modified (if ever)
Added links to social bookmarking/blog indexing sites: Del.icio.us, Digg, Furl, Google Bookmarks, and Technorati
I’ve published a fix for the Sociable plugin, which I’m now using instead of hard-coded links
If the Other Posts From Cat plugin is active, the theme will use it
Comments by the post’s author can be styled independently

page.php

Displays the page’s last modified date (instead of date of publication)

index.php

Displays the full text of the latest post and an excerpt from each of the next nine most recent posts
Native support for the_excerpt Reloaded plugin, if active

sidebar.php

Displays tag cloud, if tags are enabled

search.php

If no results found, displays the site’s most recent five posts

404.php

Displays the site’s most recent five posts

footer.php

Archive and index page titles + blog name wrapped in H1 tags

Screen shot

Search engine optimization

Apricot takes care of most of the on-page factors that Google values highly. It places the post’s title at the beginning of the title tag and in a H1 tag near the top of the page. It is free of extraneous markup and the navigation is easily spiderable. It generates what I think is a pretty logical site structure from the various post and category pages, though I have yet to study the effect of the new tagging system.

I’ve had a few top-ranked pages with this and other structurally similar layouts. Your mileage with the search engines may vary, but the layout uses fundamentally sound structural markup, which should give your site a good start.

Download

Download the theme from http://wordpress.org/extend/themes/apricot or from the link below.

Download the Apricot WordPress Theme

What if I want to use an image as a header?

Lots of people would rather use a graphic as a header, including me, but the WordPress guys insist on each theme uploaded to http://wordpress.org/extend/themes/ display the blog title and tag line.

If you want to replace the blog title and tag line with an image, download this zip file and follow these instructions (also included in readme.txt).

1. Make a PNG image, name it “header.png” and upload it to the /wp-content/themes/apricot/images/ folder. It should be 800px wide by 130px tall, or less.

2. Replace the original Apricot theme’s header.php file with the header.php file from this folder.

Download the Apricot Image Header

A WordPress Plugin for Title Case Capitalization

I’ve written a WordPress plugin that will convert the page title and post title to ‘title case’ capitalization. Title case is also often referred to as “headline style”, and incorrectly as “initial caps” or “init caps”. Title case means that the first letter of each word is capitalized, except for certain small words, such as articles, coordinating conjunctions, and short prepositions. The first and last words in the title are always capitalized.

This plugin may be useful if you’re trying to give the titles on your site a consistent appearance, but it’s no substitute for writing a good title. There are way too many exceptions and rules to make a simple script behave correctly all of the time.

The plugin is smart enough to not capitalize the following:

Coordinating conjunctions (and, but, or, nor, for)
Prepositions of four or fewer letters (with, to, for, at, and so on) (limited)
Articles (a, an, the) unless the article is the first word in the title

But the plugin isn’t perfect. It won’t capitalize an article that is the last word in the title. It fails on subordinating conjunctions. It conservatively de-capitalizes only some of the prepositions, hopefully reducing the chance of incorrect behavior. For example, it leaves the word over caps, because over can be an adverb, an adjective, a noun, or a verb (caps) or a preposition (not caps), and determining how a word is being used in a title is really beyond the scope of a humble plugin.

The plugin requires you to edit it for certain product names, like “iPod”, and cool-people names, like “Olivia d’Abo” or “Jimmy McNulty”. It’s not savvy enough to know that acronyms, like “HTML”, should be all caps unless they’re used in particular ways, such as in the case of “Using the .html Suffix”, unless you tell it. That said, editing the plugin for these particular words is very easy.

Even with all these limitations, it beats using CSS to {text-transform: capitalize} the titles or just applying PHP’s ucwords() to the entire thing. But I’m guessing that dissatisfaction with one or both of those two methods is what brought you to this page in the first place.

On the upside, it capitalizes any word following a semicolon or a colon, e.g.: “Apollo: A Retrospective Analysis”. It also capitalizes any word immediately preceded by a double or single quote, but only if you haven’t bypassed WordPress’s fancy quotes feature.

How it works

The plugin first finds all words that begin with a double or single fancy WordPress quote and adds a space behind the quote. It capitalizes all of the words in the title with ucwords(), then selectively de-capitalizes some of the words using preg_replace(). It then uses str_ireplace(), a case-insensitive string replace function, to correct the odd capitalization of certain other words. Finally, it removes the spaces behind the quotes.

The code

This is what the code looks like. It should be pretty easy to follow what’s happening.

<?php

function ardamis_titlecase($title) {
		$title = preg_replace("/&#8220;/", '&#8220; ', $title); // find double quotes and add a space behind each instance
 		$title = preg_replace("/&#8216;/", '&#8216; ', $title); // find single quotes and add a space behind each instance
		$title = preg_replace("/(?<=(?<!:|;)W)(A|An|And|At|But|By|Else|For|From|If|In|Into|Nor|Of|On|Or|The|To|With)(?=W)/e", 
'strtolower("$1")', ucwords($title));  // de-capitalize certain words unless they follow a colon or semicolon
		$specialwords = array("iPod", "iMovie", "iTunes", "iPhone", " HTML", ".html", " PHP", ".php"); // form a list of specially treated words
		$title = str_ireplace($specialwords, $specialwords, $title); // replace the specially treated words
		$title = preg_replace("/&#8220; /", '&#8220;', $title); // remove the space behind double quotes
		$title = preg_replace("/&#8216; /", '&#8216;', $title); // remove the space behind single quotes

		return $title;
}

add_filter('wp_title', 'ardamis_titlecase');
add_filter('the_title', 'ardamis_titlecase');

?>

Download

Download the plugin, upload it to your site, and activate it.

Download the Title Case Capitalization WordPress plugin

Further customization

The plugin won’t alter words written in all caps or CamelCase. You could use ucwords(strtolower($title)) to convert the entire $title to lowercase before applying ‘ucwords’. This may fix instances where someone has typed in a bunch of titles with the caps lock key on. But you’ll then have to compensate for words that should be all caps, like ‘HTML’, ‘NBC’, or ‘WoW’, in $specialwords.

An alternative using a ‘foreach’ loop

It’s possible to do something similar using a foreach loop. This isn’t as graceful, in my opinion, but I suppose it’s possible that someone may find it works better.

<?php

function ardamis_titlecase($title) {
	$donotcap = array('a','an','and','at','but','by','else','for','from','if','in','into','nor','of','on','or','the','to','with'); 
	// Split the string into separate words 
	$words = explode(' ', $title); 
	foreach ($words as $key => $word) { 
		// Capitalize all but the $donotcap words and the first word in the title
		if ($key == 0 || !in_array($word, $donotcap)) $words[$key] = ucwords($word); 
		if (preg_match("/^&#8220;/", $word))
			$words[$key] = '&#8220;' . ucwords(substr($word, 7));
		elseif (preg_match("/^&#8216;/", $word))
			$words[$key] = '&#8216;' . ucwords(substr($word, 7));
	} 
	// Join the words back into a string 
	$newtitle = implode(' ', $words); 
	return $newtitle; 
}

add_filter('wp_title', 'ardamis_titlecase');
add_filter('the_title', 'ardamis_titlecase');

?>

Credits

Thanks to Chris for insight into the preg_replace code at http://us2.php.net/ucwords. Thanks to Thomas Rutter for insight into the foreach code at SitePoint Blogs » Title Case in PHP.

Optimizing the Syntax in the WordPress Title Tag

This post was written in 2006. As of WordPress 2.5 (released in 2008), a new seplocation parameter has been added to wp_title. This allows you to reverse the page title and blog name in the title tag, in much the same way as I have described in this post. The page at http://codex.wordpress.org/Function_Reference/wp_title provides this example:

<title><?php wp_title('|',true,'right'); ?><?php bloginfo('name'); ?></title>

I’d recommend using it, instead of the admittedly complicated instructions below.

Getting the title tag just right in WordPress isn’t as easy as it ought to be. Currently, a popular title syntax for SEO purposes shows the page’s title, followed by a pipe separator, followed by the site’s name. In practice, this preferred syntax would appear as “Page Title | Site Name”. For whatever reason, the default theme in WordPress has this order reversed, so that each page’s title starts with the blog name, followed by a » separator, some useless clutter, another » separator and then the page’s title. The instructions below will help you optimize the title tag to take advantage of the prefered method.

The code for the default WordPress title tag, which is found in the “header.php” file, looks like this:

<title><?php bloginfo('name'); ?> <?php if ( is_single() ) { ?> &raquo; Blog Archive <?php } ?> <?php wp_title(); ?></title>

It seems like it should be an easy thing to clean up. We remove the unnecessary “Blog Archive” stuff and then switch the two title template tags, putting <?php bloginfo('name'); ?> behind <?php wp_title(); ?>.

Our title tag code now looks like this:

<title><?php wp_title(); ?><?php bloginfo('name'); ?></title>

But if you make this obvious change and reload one of your blog’s post pages in your browser, you’ll notice that the separator, which is inextricably part of the wp_title template tag, wants to be in front of the page title and is now the very first character in your browser’s title bar, resulting in something like “» Page Title Blog Title”. Furthermore, we are missing a desired separator between the page title and the blog title.

Let’s first do something about that initial separator. The wp_title tag we’ve been using so far, <?php wp_title(); ?>, is abbreviated, meaning that there are some options that are being allowed to fall back to their default states because we haven’t specifically provided otherwise. Changing the behavior of the wp_title separator is as easy as manipulating these options in the unabbreviated wp_title template tag. The full tag, including the options, looks something like: <?php wp_title('sep', display); ?>, where ‘sep‘ stands for whatever separator you want and display is either “true” or “false”, depending on whether you want the title displayed. For example, if you want to use the pipe symbol ” | ” to appear at the beginning of your post title, you would use: <?php wp_title('|'); ?>. (The display option defaults to “true”, which is what we want here, so I’ll omit that part in the future for the sake of brevity.)

This fiddling with different separators works just fine when the elements of the title are in the default order, but when we put wp_title at the beginning of our title tag, we get a separator as the first character in our title. We don’t want a separator in front of the Page Title, so we will use the ‘sep‘ option described above to tell WordPress to use an empty string (represented by the absence of text between two quotes) as the separator, like so: <?php wp_title(''); ?>. This is the preferred method for removing the leading separator from the wp_title tag. Now the code for our title tag looks like:

<title><?php wp_title(''); ?><?php bloginfo('name'); ?></title>

This title will cause your browser’s title bar to display “Page Title Blog Name”. We are getting closer to what we want.

Without explaining exactly how it works, let me just offer you an optional line of code to selectively add or omit a separator between the Page Title and the Blog Name as appropriate for each page: <?php if(wp_title(' ', false)) { echo ' | '; } ?>. Place this line of code between the wp_title and bloginfo template tags, as so:

<title><?php wp_title(''); ?><?php if(wp_title(' ', false)) { echo ' | '; } ?><?php bloginfo('name'); ?></title>

Reload the page again, and your title bar should show you exactly what we want, “Page Title | Blog Name”, without a leading separator. Any page without a Page Title, the home page, for example, will just have “Blog Name” in the title tag. Everything up to this point is explained on the WordPress Codex page dealing with Template Tags/wp_title. For most users, this is as far as one wants or needs to go to achieve the desired result.

Further optimization

But… if you’re really humorless about clean code, there’s more to be done. If you view the source code of these pages, you’ll notice that there are a handful of spaces after the opening title tag and before your Page Title. Yikes. Lucky for you, I like my code to be tidy and am also pretty interested in SEO, and for both of these reasons, albeit in unequal parts, these leading spaces in the title tag are unacceptable.

There are three ways to make WordPress close up the spaces whenever we declare an empty string as our separator, as in: <?php wp_title(''); ?>. The first method requires editing a file in the WordPress core. The second method is accomplished by adding a few lines of code to your theme’s ‘functions.php’ file. The third method uses a simple plugin.

Using any of these methods to remove the spaces will also remove the separator that WP wants to add between the year, month, and DD.MM.YY date of the titles of the monthly archives. So if your separator was a pipe symbol, they looked something like: 2006 | December | 17.12.06 | Ardamis.com and after removing the spaces they will look like: 2006 December 17.12.06 | Ardamis.com

Method 1 – the core file

This method involves hacking a core file. This is the most direct way to get the desired result. It basically corrects the problem the moment it happens.

For WordPress version 2.2

The file we want to edit is: \wp-includes\general-template.php. Open it up and find the following lines (beginning at line 224):

$prefix = '';
if ( !empty($title) )
	$prefix = " $sep ";

Add the line if ( $prefix == ' ' ) { $prefix = ''; } below the block, so that the block now reads:

$prefix = '';
if ( !empty($title) )
	$prefix = " $sep ";
	
if ( $prefix == '  ' ) { $prefix = ''; }

Method 2 – functions.php

If you’re not comfortable editing a core file, and if you don’t want to install yet another plugin, this method will also work. Open the ‘functions.php’ file in your theme folder and add the following lines. Depending on your theme, functions.php may already contain some PHP code; that’s ok, just tuck this in at the end. The first line of functions.php should be <?php and the last line should be ?>. If those lines don’t exist, add them and then add the following code between them.

function af_titledespacer($title) {
	return trim($title);
}

add_filter('wp_title', 'af_titledespacer');

Method 3 – a plugin

If plugins are more your speed, you can get the same results with my Despacer plugin.

Download the Despacer WordPress plugin

Utilizing the changes in the template files

We can now remove the separator and the annoying blank spaces on an instance-by-instance basis by specifying an empty string as the separator, as so: <?php wp_title(''); ?>. By way of example, to hide the separator and remove the blank spaces, a “Page Title | Blog Name” title tag would look like:

<title><?php wp_title(''); ?><?php if(wp_title(' ', false)) { echo ' | '; } ?><?php bloginfo('name'); ?></title>

If anyone finds a better way of arriving at this result, preferably entirely within the template files, please leave me a comment, or post to the WordPress support forum.

You may notice that one poster to the WordPress forums suggests that the search engines don’t care if there is white space in a web page. While I agree that the search engines and browsers don’t have any problem parsing pages that contain chunks of white space, a gap at the beginning of the title tag looks very unnatural to me. No human would intentionally add a bunch of blank spaces to the beginning of the tag, and it’s generally understood that for SEO purposes, a page that looks handcrafted is superior to one that looks like it has been slapped together by a script. This may not be a problem now, but Google and the other search engines are constantly working to remove spam/garbage/scraped sites from their results, and they may one day use weirdly unnatural artifacts like this to identify them.