Code Format Helper for WordPress (Java Program)

Displaying code on a web page can be tricky, and even trickier if you use WordPress. You may have noticed in WordPress that straight quotes turn in to curly quotes, multiple dashes turn in to en dashes and em dashes, and so on. While this may make our posts look prettier, it does ugly things to code formatting.

(See my post from yesterday on HTML Character Entity References for a table of related characters and encodings.)

Hyphen prettification is one example of where you’ll get in to trouble when trying to show some code. Your decrement --i; may get converted to –i;, breaking your code and causing would-be users to hate you. Or String s = "oops"; will become String s = “oops”;, with similarly unhappy copy-and-paste results.

Using the <pre> tag will take care of most of these problems for you… sometimes. I’ll describe some ways that “pre” doesn’t wipe away all of our tears, and offer the WordPress Code Helper as a possible solution:

Screenshot
Screenshot: WordPress Code Format Helper

This simple Java utility will encode all known problematic characters to prevent WordPress interference with your code. (Known by me, that is. Please let me know of others I missed.) The program is made available under the GNU GPL, version 3 or later.

The ideal solution would be a WordPress plugin that handles everything, but I was more interested in learning how to use the NetBeans Java IDE to create GUI applications. If you’re knowledgeable in PHP and creating WordPress plugins, I’ll leave it to you to do the right thing. I think you’d find a lot of people are interested in this feature.

Pitfalls

So why doesn’t <pre> do the job by itself? Here are the problems I’ve seen using WordPress 2.0.x. Maybe things have been improved in later versions, although that could be a problem also, if earlier workarounds break things later. (In any case, with code, it’s best if we can explicitly say what we mean, rather than worry about the whims of various content management systems.)

What are we going to do about it?

We could go around and manually encode all of the problem characters, but that’s a lot of work and it’s easy to miss things. And later if the code changes, we’ll either have to do it all over again, or we’ll have to modify HTML that is hard to read properly as code.

Or we could use a parser that automatically handles everything. Again, using a Java application will be inconvenient for a lot of people, so it would be best to have a WordPress plugin that does this, but here’s what you get for now.

Tags

WordPress Code Helper reads your code line by line, and considers each character. If a < character is encountered, it will look to see if there is a closing > and if one of these strings follows immediately after the <:

pre  /pre  code  /code  span  /span  "a href"  /a  b  /b  i  /i

If so, we’ll include everything from the < to the > “as is” in the target and skip to the next character after the >. It’s assumed there will be a closing > on the same line if it’s really a tag. If not, the < will be encoded. (I’ve only included the tags I might normally use inside a <pre> block. It wouldn’t be difficult to add more.)

Character Entities

If an ampersand (&) character is encountered, we’ll look for a closing semi-colon to see if we’re dealing with a character entity that we don’t want to change. If the contents are equal to one of these:

amp  lt  gt  nbsp

Or matches the reqular expression:

#[0-9][0-9][0-9]?[0-9]?           //e.g. "#4321", "#432", "#43"

Then the parser will likewise not encode the & and will skip to the next character after the semi-colon. Again, these are the only character entities I can personally foresee wanting to use in a <pre> block, and it would be easy to add others.

Else

If a character is not part of a tag or a character entity, we’ll (potentially) encode it using the following method:

private static String encode(char c) {

    String str;

    switch (c) {
        case '&'  : str = "&#0038;"; break;
        case '<'  : str = "&#0060;"; break;
        case '>'  : str = "&#0062;"; break;
        case '\'' : str = "&#0039;"; break;
        case '"'  : str = "&#0034;"; break;
        case '\\' : str = "&#0092;"; break;
        case '-'  : str = "&#0045;"; break;
        case '.'  : str = "&#0046;"; break;
        default   : str = Character.toString(c);
    }

    return str;
}

(Although here is a case where the code formatter didn’t help as much: I had to tinker with the numeric character entities so that they show up unencoded and not as the actual character.)

All character entities are created in the target as four digit numeric entities instead of using named entitites. This was a somewhat arbitrary decision. It may help make it clear which characters were automatically encoded versus which may have been manually encoded, assuming you normally use named entities. You can convert backwards from target to source. All of the handled numeric encodings will be replaced with the original character. This might be useful if you want to modify your code later.

Not Now

<-- HTML/XML comments are not currently handled because they’re not something I normally use in blog post code blocks and I didn’t want to expend effort on them up front. -->

Downloads

WordPress Code Helper is licensed with the GNU GPL, version 3 or later.

The class files were compiled with JDK 6, but I’m guessing they’ll compile and run under Java 4. I don’t think I used version 5 or 6 features, although I’m not sure what NetBeans might have used for the generated GUI code. In any case, I’d recommend the latest JVM since I’ve heard that Swing GUI performance has been greatly improved. (Not that this little interface will tax your system.)

Update, 10 Nov 2007: In reading more about NetBeans GUI layouts, I think this project does depend on Java 6 for the GroupLayout that is part of the core libraries. The “Swing Layout Extensions” library supports GroupLayout and is available for Java 5 (and earlier?).

I’ve tested the program with the Java 6 JVM in Ubuntu GNU/Linux and Windows XP.

Related

If you enjoyed this article, please subscribe for free!
Via the atom or rss feed, or enter your email address to get updates when new entries are posted:
(Your email will not be shared nor used for anything other than sending new posts. See the policies page for more about subscriptions and privacy.)

You can skip to the end and leave a response. Pinging is currently not allowed.

Comments

  1. Hi,
    wp-code-helper.jar is really a good help for me while writing code to my blog.
    I would like to know that is there any way so that I can get a larger font,as currently when I paste the converted code to my wordpress blog’s CODE part, after publishing it looks to small.
    so could you suggest me anything wrt to coding or editing your source code.

    Regards,
    Rajesh.
    rajesh4it@gmail.com

  2. Thanks, Rajesh.

    I’d handle the font in your CSS. For example, I usually put code in a pre block like so:

    <pre class="code">
    source
    code
    goes
    here
    </pre>
    

    And set the style for the pre tag and .code class as:

    pre {
    	overflow: auto;
    	border: 1px solid #afafaf;
    	padding: 3px;
    	margin: 3px;
    	font-size: 1.05em;
    	}
    
    .code {
    	background-color: #eaf9ee;
    	color: black;
    	}
    

    You can style the code tag also, of course.

    I use:

    code {
    	background-color: #eaf9ee;
    	color: black;
    	font-size: 1.1em;
    	}
    

    So that inline code looks like this.

You can follow any responses to this entry through the
comments feed.

Say Your Say

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

By submitting your comment here, you agree to license it under the same Creative Commons Attribution-ShareAlike 3.0 License as the movingtofreedom.org web site. Please see policies for more information about comments and privacy.