WordPress is very good about supporting standards and producing valid markup, and at least when I started using it, it had a link in the standard theme to proclaim the validity of its pages and prove it to you by taking you to the W3C Markup Validation Service. A lot of people never pay this any attention and promptly produce a bunch of non-complying pages, all the while shamefully leaving the boastful link in the sidebar or footer of their site.

Being a retentive sort about many things, I’ve always worked to ensure my pages validate correctly, looking for the below highly satisfying message on each and every one of these posts that I strive mightily to create for you:

Screenshot of W3C Markup Validation Page

(Versus the alternative which is highlighted with red instead of green, and tells you that you are not just inadequate but are a bad person as well.)

But this can be a tedious chore. In order to validate by URI, your page has to be already published in order to be reachable by the W3C server. This will never do. We must have compliance before publication! I work on my blog locally, so I can publish the page and then capture the HTML source in order to enter it in the Direct Input Validator, but this isn’t at all convenient, especially if you have errors and need to fix them and therefore repeat the cycle.

WordPress XHTML Validator Plugin

I considered making a WordPress plugin to give me a button on the edit screen which would let me send the preview page source HTML directly to the W3C validator. Then I could perform a quick verification after composing a post. But I’m not versed in WordPress plugins and not keen on spending too much time in that area right now. I’d previously looked for plugins without finding a good one, but recently tried again. So often you can sweat over something and then find out someone has already done the work for you two years ago.

For example, this time my search turned up the WordPress XHTML Validator from rudd-o.com. It’s been waiting patiently since 2005 for me to find it, and it’s perfect for what I’m trying to do. I’ve been using it for a week now and it works great in WordPress 2.0.x. (As with most software I write about here, it’s free software, and I’m grateful to Manual Amador for creating and freely sharing it.)

It uses two popular command line utilities, xmllint and html tidy, and validates every time you press the “Save and Continue Editing” button. There is also a feature to check all posts and pages and produce a list of problem pages.

xmllint

I think I’ve heard of this program before but wasn’t familiar with what it can do. Not surprisingly given the topic of this post, it can validate XML. It was already installed on my Ubuntu 7.04/Feisty Fawn system. I thought it would be excessive and slow things down to validate against the W3C web site on every save, but it doesn’t use the external DTD. (However, I so far haven’t determined where it gets the DTD from.) A character entity file is included to let it correctly parse things like  . Since it has low overhead, I like the idea of validating every time and keeping it clean as I go. Saves an extra step of checking later and finding and fixing several things at once.

html tidy

This one isn’t installed on my machine but I could add it easily enough with sudo apt-get install tidy. I suspect it can be used to quickly fix problems with the html, but I’m just as happy managing this myself, so I haven’t installed it.

You can run the plugin with either program. It’s useful to me with only xmllint; I’m not sure how well it would do with only html tidy.

Web Host Considerations

If you want to run this plugin, you’ll need xmllint and/or html tidy installed on your web hosting server. It also relies on PHP5. My host, SurpassHosting, has xmllint, but is currently at PHP4, so I can’t use it there. But it works fine on my local machine which has PHP5, and this is where I compose my posts, so it’s not really necessary to run the plugin on my “real” blog.

Although I’d like to be able to use the feature to check all pages, just to be sure, speaking of which…

OCD Considerations

Even though I’ve tried validating all my pages along the way, I ran in to a problem when checking my old posts. When WordPress generates pages, it fixes many things for you. So even though the whole page might validate against the W3C site, the post content itself on a whole bunch of pages showed up with errors in the plugin report. It didn’t like unencoded ampersands in urls and missing </p> tags, among other things. So I felt compelled to fix all of these, because I want to be able to run all the pages and get a clean bill of health.

W3C DTD Netiquette

In related news, slashdot posted a story about excessive DTD requests to W3C servers that I found interesting in light of all this XHTML validation.

From the W3C blog posting:

If you view the source code of a typical web page, you are likely to see something like this near the top:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

and/or

<html xmlns="http://www.w3.org/1999/xhtml" ...>

These refer to HTML DTDs and namespace documents hosted on W3C’s site.

Note that these are not hyperlinks; these URIs are used for identification. This is a machine-readable way to say “this is HTML”. In particular, software does not usually need to fetch these resources, and certainly does not need to fetch the same one over and over! Yet we receive a surprisingly large number of requests for such resources: up to 130 million requests per day, with periods of sustained bandwidth usage of 350Mbps, for resources that haven’t changed in years.

[...]

In one case we noticed, a number of IP addresses at one company were requesting DTDs from our site more than three hundred thousand times per day each, per IP address.

–Ted Guild, “W3C’s Excessive DTD Traffic”

With that said, I feel better about a solution that doesn’t continually check with the W3C web site!

But then again, if you look on their Markup Validation page, they encourage us like so:

Congratulations

The document located at <http://www.movingtofreedom.org/> was checked and found to be valid XHTML 1.0 Transitional. This means that the resource in question identified itself as “XHTML 1.0 Transitional” and that we successfully performed a formal validation using an SGML or XML Parser (depending on the markup language used).

To show your readers that you have taken the care to create an interoperable Web page, you may display this icon on any page that validates. Here is the HTML you could use to add this icon to your Web page:

Valid XHTML 1.0 Transitional

<p>
  <a href="http://validator.w3.org/check?uri=referer"><img
      src="http://www.w3.org/Icons/valid-xhtml10"
      alt="Valid XHTML 1.0 Transitional" height="31" width="88" /></a>
</p>

So they’re just asking for it. :-)

But to be fair, random requests from curious readers aren’t going to amount to much in the face of companies that make hundreds of thousands of requests every day.