Category Archives: Web Site Dev

Posts concerning web site design and development. Examples of php, xhtml, and javascript code. Wordpress-related posts may be cross-categorized here, but also have their own category.

I was tired of seeing the majority of my posts’ comments feeds show up in Google’s Supplemental Index, so I changed all the individual posts’ comments RSS links to rel=”nofollow”. This should at least cause Googlebot to stop passing PageRank through those links, but what I really want is for Googlebot to stop spidering the individual posts’ comment feeds, in hopes that they’ll eventually be removed from the index. To see only those pages of a site that are in the Supplemental Index, use this neat little search feature: site:DOMAIN.com *** -view. For example, to see which pages of Ardamis.com are in the SI, I’d search for: site:ardamis.com *** -view. This is much easier than the old way of scanning all of the indexed pages and picking them out by hand.

To change all the individual posts’ comments feed links to rel=”nofollow”, open ‘wp-includesfeed-functions.php’ and add rel=”nofollow” to line 84 (in WordPress version 2.0.6), as so:

echo "<a href="$url" rel="nofollow">$link_text</a>";

One could use the robots.txt file to disallow Googlebot from all /feed/ directories, but this would also restrict it from the general site’s feed and the all-inclusive /comments/feed/, and I’d like the both of these feeds to continue to be spidered. Another, minor consequence of using robots.txt to restrict Googlebot is that Google Sitemaps will warn you of “URLs restricted by robots.txt”.

To deny all spiders from any /feed/ directory, add the following to your robots.txt file:

User-agent:*
Disallow: /feed/

To deny just Googlebot from any /feed/ directory, use:

User-agent: Googlebot
Disallow: /feed/

For whatever reason, the whole-site comments feed at //ardamis.com/comments/feed/ does not appear among my indexed pages, while the nearly empty individual post feeds are indexed. Also, the general site feed at //ardamis.com/feed/ is in the Supplemental Index. It’s a mystery to me why.

I’ll occasionally return to a post and revise it for improved methodology, test results or whatever. But the ‘posted on’ date always remains the same, even after the post has been updated. I feel that displaying only the ‘posted on’ date could be somewhat confusing, particularly when I state that I’ve updated the post in a comment dated months later. So, in the interest of full disclosure, I have added a few lines of code to the WordPress ‘single.php’ template file to supplement each post’s meta data with the date it was last modified. If the post has never been modified or if it was last modified within 24 hours of the ‘posted on’ date, only the ‘posted on’ date is shown.

Of course, this could be used anywhere inside the WordPress loop, not just in the meta data section. The code I use to show a WordPress post’s last modified date and time is as follows.

The default Kubrick template’s meta data section:

<p class="postmetadata alt">
<small>
This entry was posted 
...
on <?php the_time('l, F jS, Y') ?> 
at <?php the_time() ?>
and is filed under <?php the_category(', ') ?>.
You can follow any responses to 
this entry through the 
<?php comments_rss_link('RSS 2.0'); ?> feed. 

The new code, modified to selectively display the last modified date:

<p class="postmetadata alt">
<small>
This entry was posted
...
on <?php the_time('F jS, Y') ?> 
at <?php the_time() ?>
						
<?php $u_time = get_the_time('U'); 
$u_modified_time = get_the_modified_time('U'); 
if ($u_modified_time >= $u_time + 86400) { 
echo "and last modified on "; 
the_modified_time('F jS, Y'); 
echo " at "; 
the_modified_time(); 
echo ", "; } ?>
	
and is filed under <?php the_category(', ') ?>.
You can follow any responses to 
this entry through the 
<?php comments_rss_link('RSS 2.0'); ?> feed. 

You can see how this works in the meta data section of this post.

Further customization

I’m using a grace period of 24 hours from the time the post was published, but you could change this by replacing 86400 with however much time you want, specified in seconds.

I’ve written a WordPress plugin that will convert the page title and post title to ‘title case’ capitalization. Title case is also often referred to as “headline style”, and incorrectly as “initial caps” or “init caps”. Title case means that the first letter of each word is capitalized, except for certain small words, such as articles, coordinating conjunctions, and short prepositions. The first and last words in the title are always capitalized.

This plugin may be useful if you’re trying to give the titles on your site a consistent appearance, but it’s no substitute for writing a good title. There are way too many exceptions and rules to make a simple script behave correctly all of the time.

The plugin is smart enough to not capitalize the following:

  • Coordinating conjunctions (and, but, or, nor, for)
  • Prepositions of four or fewer letters (with, to, for, at, and so on) (limited)
  • Articles (a, an, the) unless the article is the first word in the title

But the plugin isn’t perfect. It won’t capitalize an article that is the last word in the title. It fails on subordinating conjunctions. It conservatively de-capitalizes only some of the prepositions, hopefully reducing the chance of incorrect behavior. For example, it leaves the word over caps, because over can be an adverb, an adjective, a noun, or a verb (caps) or a preposition (not caps), and determining how a word is being used in a title is really beyond the scope of a humble plugin.

The plugin requires you to edit it for certain product names, like “iPod”, and cool-people names, like “Olivia d’Abo” or “Jimmy McNulty”. It’s not savvy enough to know that acronyms, like “HTML”, should be all caps unless they’re used in particular ways, such as in the case of “Using the .html Suffix”, unless you tell it. That said, editing the plugin for these particular words is very easy.

Even with all these limitations, it beats using CSS to {text-transform: capitalize} the titles or just applying PHP’s ucwords() to the entire thing. But I’m guessing that dissatisfaction with one or both of those two methods is what brought you to this page in the first place.

On the upside, it capitalizes any word following a semicolon or a colon, e.g.: “Apollo: A Retrospective Analysis”. It also capitalizes any word immediately preceded by a double or single quote, but only if you haven’t bypassed WordPress’s fancy quotes feature.

How it works

The plugin first finds all words that begin with a double or single fancy WordPress quote and adds a space behind the quote. It capitalizes all of the words in the title with ucwords(), then selectively de-capitalizes some of the words using preg_replace(). It then uses str_ireplace(), a case-insensitive string replace function, to correct the odd capitalization of certain other words. Finally, it removes the spaces behind the quotes.

The code

This is what the code looks like. It should be pretty easy to follow what’s happening.

<?php

function ardamis_titlecase($title) {
		$title = preg_replace("/&#8220;/", '&#8220; ', $title); // find double quotes and add a space behind each instance
 		$title = preg_replace("/&#8216;/", '&#8216; ', $title); // find single quotes and add a space behind each instance
		$title = preg_replace("/(?<=(?<!:|;)W)(A|An|And|At|But|By|Else|For|From|If|In|Into|Nor|Of|On|Or|The|To|With)(?=W)/e", 
'strtolower("$1")', ucwords($title));  // de-capitalize certain words unless they follow a colon or semicolon
		$specialwords = array("iPod", "iMovie", "iTunes", "iPhone", " HTML", ".html", " PHP", ".php"); // form a list of specially treated words
		$title = str_ireplace($specialwords, $specialwords, $title); // replace the specially treated words
		$title = preg_replace("/&#8220; /", '&#8220;', $title); // remove the space behind double quotes
		$title = preg_replace("/&#8216; /", '&#8216;', $title); // remove the space behind single quotes

		return $title;
}

add_filter('wp_title', 'ardamis_titlecase');
add_filter('the_title', 'ardamis_titlecase');

?>

Download

Download the plugin, upload it to your site, and activate it.

Download the Title Case Capitalization WordPress plugin

Further customization

The plugin won’t alter words written in all caps or CamelCase. You could use ucwords(strtolower($title)) to convert the entire $title to lowercase before applying ‘ucwords’. This may fix instances where someone has typed in a bunch of titles with the caps lock key on. But you’ll then have to compensate for words that should be all caps, like ‘HTML’, ‘NBC’, or ‘WoW’, in $specialwords.

An alternative using a ‘foreach’ loop

It’s possible to do something similar using a foreach loop. This isn’t as graceful, in my opinion, but I suppose it’s possible that someone may find it works better.

<?php

function ardamis_titlecase($title) {
	$donotcap = array('a','an','and','at','but','by','else','for','from','if','in','into','nor','of','on','or','the','to','with'); 
	// Split the string into separate words 
	$words = explode(' ', $title); 
	foreach ($words as $key => $word) { 
		// Capitalize all but the $donotcap words and the first word in the title
		if ($key == 0 || !in_array($word, $donotcap)) $words[$key] = ucwords($word); 
		if (preg_match("/^&#8220;/", $word))
			$words[$key] = '&#8220;' . ucwords(substr($word, 7));
		elseif (preg_match("/^&#8216;/", $word))
			$words[$key] = '&#8216;' . ucwords(substr($word, 7));
	} 
	// Join the words back into a string 
	$newtitle = implode(' ', $words); 
	return $newtitle; 
}

add_filter('wp_title', 'ardamis_titlecase');
add_filter('the_title', 'ardamis_titlecase');

?>

Credits

Thanks to Chris for insight into the preg_replace code at http://us2.php.net/ucwords. Thanks to Thomas Rutter for insight into the foreach code at SitePoint Blogs » Title Case in PHP.

I needed to find a satisfactory way of adding WordPress tags and theme elements (such as the sidebar) to pages that exist outside of WordPress. A non-WordPress page could then appear to be seemlessly incorporated into the site, wherein the layout automatically updates with changes to the theme template files, and could use the same header, sidebar, and footer as a normal WordPress page.

The first few solutions that I found involved adding a line to each non-WordPress page. This does indeed allow the page to incorporate WordPress tags, theme elements and styles, but there is a serious drawback to this method because of the way WP manages a web site.

When you click on the link to a WP page, or enter it into the address bar, you aren’t actually going to a file that resides at that address. Instead, WP uses that address as an instruction to pull various database entries and form an index.php page that resides in the WP installation directory. For example, while the address for this page appears to be https://ardamis.com/2006/07/10/wordpress-googlebot-404-error/ , the actual page is at https://ardamis.com/index.php.

WordPress assumes that it is responsible for every page on the site, and that there are no pages on the site that it doesn’t know about. If you try to browse to a page that WP doesn’t know about, it sends you a 404 error page instead. This is what you want it to do, so long as you don’t create any pages outside of WordPress.

But a problem arises when you do want to create a page that WP doesn’t know about. When you visit the page, WP checks the database and finds that it doesn’t exist, so it puts a 404 error in the http header, but the page does exist, so Apache sends the page with the 404 error. This behavior seemed to cause some problems with some versions of IE but none with Firefox. It did, however, result in a 404 header being given to Googlebot, so that non-WordPress pages would incorrectly show up in Google Sitemaps as Not Found.

To get around this problem and send the correct http header code: HTTP Status Code: HTTP/1.1 200 OK, I needed to require a different file, wp-config.php, and then select specific functions for use in the page. results in a page that can use all of the desired tags and theme elements and also sends the correct header code: HTTP Status Code: HTTP/1.1 200 OK

The following code results in a page that can use all of the tags and theme elements (you may need to adjust the path to wp-config.php):

<?php require('./wp-config.php');
$wp->init();
$wp->parse_request();
$wp->query_posts();
$wp->register_globals();
?>

<?php get_header(); ?>

<div id="content">
<div class="post">
<h2>*** Heading Goes Here ***</h2>
<div class="entry">
*** Content in Paragraph Tags Goes Here ***
</div>
</div>
</div>

<?php get_sidebar(); ?>

<?php get_footer(); ?>

Testing the method

Using wp-blog-header.php as the include, I created a GoogleBot/WordPress 404 test page as the index.php file in a /testing/ directory. I added the url https://ardamis.com/testing/ to my Google XML sitemap file, and waited for the sitemap to be downloaded. Sure enough, a few days later, Google Sitemaps was listing the /testing/ url among the Not Found errors.

The next step was to remove what I suspected was the culprit, the include of the WordPress header, wp-blog-header.php, and see if Googlebot could again access the page. A few days after removing the include, and after the sitemap was downloaded, the Not Found error disappeared. I’m interpreting this as Googlebot once again successfully crawling the page.

The third step was to use the above code, including wp-config.php and then testing the HTTP Request and Response Header. The header looks ok, and Googlebot likes it. It looks like this does the trick.

This post was written in 2006. As of WordPress 2.5 (released in 2008), a new seplocation parameter has been added to wp_title. This allows you to reverse the page title and blog name in the title tag, in much the same way as I have described in this post. The page at http://codex.wordpress.org/Function_Reference/wp_title provides this example:

<title><?php wp_title('|',true,'right'); ?><?php bloginfo('name'); ?></title>

I’d recommend using it, instead of the admittedly complicated instructions below.

Getting the title tag just right in WordPress isn’t as easy as it ought to be. Currently, a popular title syntax for SEO purposes shows the page’s title, followed by a pipe separator, followed by the site’s name. In practice, this preferred syntax would appear as “Page Title | Site Name”. For whatever reason, the default theme in WordPress has this order reversed, so that each page’s title starts with the blog name, followed by a » separator, some useless clutter, another » separator and then the page’s title. The instructions below will help you optimize the title tag to take advantage of the prefered method.

The code for the default WordPress title tag, which is found in the “header.php” file, looks like this:

<title><?php bloginfo('name'); ?> <?php if ( is_single() ) { ?> &raquo; Blog Archive <?php } ?> <?php wp_title(); ?></title>

It seems like it should be an easy thing to clean up. We remove the unnecessary “Blog Archive” stuff and then switch the two title template tags, putting <?php bloginfo('name'); ?> behind <?php wp_title(); ?>.

Our title tag code now looks like this:

<title><?php wp_title(); ?><?php bloginfo('name'); ?></title>

But if you make this obvious change and reload one of your blog’s post pages in your browser, you’ll notice that the separator, which is inextricably part of the wp_title template tag, wants to be in front of the page title and is now the very first character in your browser’s title bar, resulting in something like “» Page Title Blog Title”. Furthermore, we are missing a desired separator between the page title and the blog title.

Let’s first do something about that initial separator. The wp_title tag we’ve been using so far, <?php wp_title(); ?>, is abbreviated, meaning that there are some options that are being allowed to fall back to their default states because we haven’t specifically provided otherwise. Changing the behavior of the wp_title separator is as easy as manipulating these options in the unabbreviated wp_title template tag. The full tag, including the options, looks something like: <?php wp_title('sep', display); ?>, where ‘sep‘ stands for whatever separator you want and display is either “true” or “false”, depending on whether you want the title displayed. For example, if you want to use the pipe symbol ” | ” to appear at the beginning of your post title, you would use: <?php wp_title('|'); ?>. (The display option defaults to “true”, which is what we want here, so I’ll omit that part in the future for the sake of brevity.)

This fiddling with different separators works just fine when the elements of the title are in the default order, but when we put wp_title at the beginning of our title tag, we get a separator as the first character in our title. We don’t want a separator in front of the Page Title, so we will use the ‘sep‘ option described above to tell WordPress to use an empty string (represented by the absence of text between two quotes) as the separator, like so: <?php wp_title(''); ?>. This is the preferred method for removing the leading separator from the wp_title tag. Now the code for our title tag looks like:

<title><?php wp_title(''); ?><?php bloginfo('name'); ?></title>

This title will cause your browser’s title bar to display “Page Title Blog Name”. We are getting closer to what we want.

Without explaining exactly how it works, let me just offer you an optional line of code to selectively add or omit a separator between the Page Title and the Blog Name as appropriate for each page: <?php if(wp_title(' ', false)) { echo ' | '; } ?>. Place this line of code between the wp_title and bloginfo template tags, as so:

<title><?php wp_title(''); ?><?php if(wp_title(' ', false)) { echo ' | '; } ?><?php bloginfo('name'); ?></title>

Reload the page again, and your title bar should show you exactly what we want, “Page Title | Blog Name”, without a leading separator. Any page without a Page Title, the home page, for example, will just have “Blog Name” in the title tag. Everything up to this point is explained on the WordPress Codex page dealing with Template Tags/wp_title. For most users, this is as far as one wants or needs to go to achieve the desired result.

Further optimization

But… if you’re really humorless about clean code, there’s more to be done. If you view the source code of these pages, you’ll notice that there are a handful of spaces after the opening title tag and before your Page Title. Yikes. Lucky for you, I like my code to be tidy and am also pretty interested in SEO, and for both of these reasons, albeit in unequal parts, these leading spaces in the title tag are unacceptable.

There are three ways to make WordPress close up the spaces whenever we declare an empty string as our separator, as in: <?php wp_title(''); ?>. The first method requires editing a file in the WordPress core. The second method is accomplished by adding a few lines of code to your theme’s ‘functions.php’ file. The third method uses a simple plugin.

Using any of these methods to remove the spaces will also remove the separator that WP wants to add between the year, month, and DD.MM.YY date of the titles of the monthly archives. So if your separator was a pipe symbol, they looked something like: 2006 | December | 17.12.06 | Ardamis.com and after removing the spaces they will look like: 2006 December 17.12.06 | Ardamis.com

Method 1 – the core file

This method involves hacking a core file. This is the most direct way to get the desired result. It basically corrects the problem the moment it happens.

For WordPress version 2.2

The file we want to edit is: \wp-includes\general-template.php. Open it up and find the following lines (beginning at line 224):

$prefix = '';
if ( !empty($title) )
	$prefix = " $sep ";

Add the line if ( $prefix == ' ' ) { $prefix = ''; } below the block, so that the block now reads:

$prefix = '';
if ( !empty($title) )
	$prefix = " $sep ";
	
if ( $prefix == '  ' ) { $prefix = ''; }

Method 2 – functions.php

If you’re not comfortable editing a core file, and if you don’t want to install yet another plugin, this method will also work. Open the ‘functions.php’ file in your theme folder and add the following lines. Depending on your theme, functions.php may already contain some PHP code; that’s ok, just tuck this in at the end. The first line of functions.php should be <?php and the last line should be ?>. If those lines don’t exist, add them and then add the following code between them.

function af_titledespacer($title) {
	return trim($title);
}

add_filter('wp_title', 'af_titledespacer');

Method 3 – a plugin

If plugins are more your speed, you can get the same results with my Despacer plugin.

Download the Despacer WordPress plugin

Utilizing the changes in the template files

We can now remove the separator and the annoying blank spaces on an instance-by-instance basis by specifying an empty string as the separator, as so: <?php wp_title(''); ?>. By way of example, to hide the separator and remove the blank spaces, a “Page Title | Blog Name” title tag would look like:

<title><?php wp_title(''); ?><?php if(wp_title(' ', false)) { echo ' | '; } ?><?php bloginfo('name'); ?></title>

If anyone finds a better way of arriving at this result, preferably entirely within the template files, please leave me a comment, or post to the WordPress support forum.

You may notice that one poster to the WordPress forums suggests that the search engines don’t care if there is white space in a web page. While I agree that the search engines and browsers don’t have any problem parsing pages that contain chunks of white space, a gap at the beginning of the title tag looks very unnatural to me. No human would intentionally add a bunch of blank spaces to the beginning of the tag, and it’s generally understood that for SEO purposes, a page that looks handcrafted is superior to one that looks like it has been slapped together by a script. This may not be a problem now, but Google and the other search engines are constantly working to remove spam/garbage/scraped sites from their results, and they may one day use weirdly unnatural artifacts like this to identify them.

Updated for XAMPP version 1.6.5

Some time ago, I decided to start phasing out static xhtml in favor of pages using PHP includes. To test these new pages, I used apachefriends.org’s wonderful XAMPP (which I really can’t recommend highly enough) to install Apache, MySQL, and PHP (among other things). Once I had my local server running, I put each dev site into its own folder in \htdocs\ and navigated to them by http://127.0.0.1/foldername/.

This setup was functional but far from ideal, as the index pages for these local sites weren’t in what could be considered a root directory, which lead to some tip-toeing around when creating links.

Then I discovered the NameVirtualHost feature in Apache. NameVirtualHost allows the server admin to set up multiple domains/hostnames on a single Apache installation by using VirtualHost containers. In other words, you can run more than one web site on a single machine. This means that each dev site (or domain) can then consider itself to have a root directory. You will be able to access each local site as a subdomain of “localhost” by making a change to the HOSTS file. For example, I access the local dev version of this site at http://ardamis.localhost/.

This works great for all sorts of applications that rely on the site having a discernible root directory, such as WordPress.

Unfortunately, setting up NameVirtualHost can be kind of tricky. If you are having problems configuring your Apache installation to use the NameVirtualHost feature, you’re in good company. Here’s how I managed to get it working:

For XAMPP version 1.6.5

  1. Create a folder in drive:\xampp\htdocs\ for each dev site (adjust for your directory structure). For example, if I’m creating a development site for ardamis.com on my d: drive, I’d create a folder at:
    d:\xampp\htdocs\ardamis\
  2. Edit your HOSTS file (in Windows XP, the HOSTS file is located in C:\WINDOWS\system32\drivers\etc\) to add the following line, where sitename is the name of the folder you created in step 1. Don’t change or delete the existing “127.0.0.1 localhost” line.
    127.0.0.1 sitename.localhost
    

    Add a new line for each dev site folder you create.

    Continuing with the example, I’ve added the line:
    127.0.0.1 ardamis.localhost

  3. Open your drive:\xampp\apache\conf\extra\httpd-vhosts.conf file and add the following lines to the end of the file, using the appropriate letter in place of drive. Do this step only once. We’ll add code for each dev site’s folder in the next step. (Yes, keep the asterisk.)
    NameVirtualHost *:80
    <VirtualHost *:80>
        DocumentRoot "drive:/xampp/htdocs"
    ServerName localhost
    </VirtualHost>
    

    My DocumentRoot line would be:
    DocumentRoot "d:/xampp/htdocs"

  4. Immediately after that, add the following lines, changing sitename to the name of the new dev site’s folder, again using the appropriate letter in place of drive. Repeat this step for every folder you’ve created.
    <VirtualHost *:80>
        DocumentRoot "drive:/xampp/htdocs/sitename"
        ServerName sitename.localhost
    </VirtualHost>
    

    My DocumentRoot line would be:
    DocumentRoot "d:/xampp/htdocs/ardamis"
    My ServerName line would be:
    ServerName ardamis.localhost

  5. Reboot your computer to be sure it’s using the new HOSTS file (you’ll have to at least restart Apache). You should now be able to access each dev domain by way of:
    http://sitename.localhost/

For XAMPP version 1.4

If you are using an older version of XAMPP (like XAMPP version 1.4) without the httpd-vhosts.conf file, use the instructions below.

  1. Create a folder in your drive:\apachefriends\xampp\htdocs\ for each local version of your site. For example, if I’m creating a development site for ardamis.com on my f: drive, I’d create a folder at:
    f:\apachefriends\xampp\htdocs\ardamis\
  2. Open your HOSTS file (in Windows XP, the HOSTS file is located in C:\WINDOWS\system32\drivers\etc\) and add the following line, where sitename is the name of the folder you created in step 1. Repeat this step, as necessary, for each folder you create. Don’t change or delete the existing “127.0.0.1 localhost” line.
    127.0.0.1 sitename.localhost
    

    Continuing with the example, I’ve added the line:
    127.0.0.1 ardamis.localhost

  3. Open your drive:\apachefriends\xampp\apache\conf\httpd.conf file and add the following lines to the end of the file, using the appropriate letter for drive. Do this step only once. We’ll add code for each dev site’s folder in the next step. (Yes, keep the asterisk.)
    NameVirtualHost *:80
    <VirtualHost *:80>
        DocumentRoot "drive:/apachefriends/xampp/htdocs"
    ServerName localhost
    </VirtualHost>
    

    My DocumentRoot line would be:
    DocumentRoot "f:/apachefriends/xampp/htdocs"

  4. Immediately after that, add the following lines, changing sitename to the name of the new folder, using the appropriate letter for drive and repeating this step for every folder you’ve created.
    <VirtualHost *:80>
        DocumentRoot "drive:/apachefriends/xampp/htdocs/sitename"
        ServerName sitename.localhost
    </VirtualHost>
    

    My DocumentRoot line would be:
    DocumentRoot "f:/apachefriends/xampp/htdocs/ardamis"

  5. Reboot and restart Apache. Open a browser; you should now be able to access each folder by way of:
    http://sitename.localhost

I’m assuming that you could change the DocumentRoot line to point to any folder on any drive. I’ll experiment with pointing this at a folder on another drive later.

The official Apache.org documentation for VirtualHost is at http://httpd.apache.org/docs/2.2/vhosts/. You may want to read that for further details before you try to set up virtual hosts.

If you have any questions about the above instructions, the Apache NameVirtualHost function, XAMPP, or anything in between, post a comment, but I can’t promise that I’ll be able to help. I’m learning as I go along, too.