Code Format Helper for WordPress (Java Program)
Displaying code on a web page can be tricky, and even trickier if you use WordPress. You may have noticed in WordPress that straight quotes turn in to curly quotes, multiple dashes turn in to en dashes and em dashes, and so on. While this may make our posts look prettier, it does ugly things to code formatting.
(See my post from yesterday on HTML Character Entity References for a table of related characters and encodings.)
Hyphen prettification is one example of where you’ll get in to trouble when trying to show some code. Your decrement --i; may get converted to –i;, breaking your code and causing would-be users to hate you. Or String s = "oops"; will become String s = “oops”;, with similarly unhappy copy-and-paste results.
Using the <pre> tag will take care of most of these problems for you… sometimes. I’ll describe some ways that “pre” doesn’t wipe away all of our tears, and offer the WordPress Code Helper as a possible solution:
Screenshot
This simple Java utility will encode all known problematic characters to prevent WordPress interference with your code. (Known by me, that is. Please let me know of others I missed.) The program is made available under the GNU GPL, version 3 or later.
The ideal solution would be a WordPress plugin that handles everything, but I was more interested in learning how to use the NetBeans Java IDE to create GUI applications. If you’re knowledgeable in PHP and creating WordPress plugins, I’ll leave it to you to do the right thing. I think you’d find a lot of people are interested in this feature.
Pitfalls
So why doesn’t <pre> do the job by itself? Here are the problems I’ve seen using WordPress 2.0.x. Maybe things have been improved in later versions, although that could be a problem also, if earlier workarounds break things later. (In any case, with code, it’s best if we can explicitly say what we mean, rather than worry about the whims of various content management systems.)
-
There is a problem with the \backslash. Backslashes inside <pre> tags in static HTML work fine, but in WordPress some processing happens that removes a single backslash, apparently treating it as an escape character. I found that putting a second backslash causes only one to be displayed. This is worrisome: what if the behavior changed in the future or I wanted to migrate to some other software that didn’t do this? It bothered me to have that chance of leaving a trail of broken code samples.
-
There are several things that are encoded correctly if you don’t use other tags in your <pre> block, but if you use another tag, everything following the tag will be encoded with normal WordPress rules. I first discovered this when I wanted to include a hyperlink in one of the code comments. Other reasons you may include tags inside your “pre”: you want to use <span> to color your code comments, or you might want to make keywords and method names be bold or italicized.
-
Finally, there are the standard HTML/XML meta characters: & < >. These might display ok inside <pre>, but they aren’t valid, and if you’re obsessive about your pages validating like I am, this will bother you.
What are we going to do about it?
We could go around and manually encode all of the problem characters, but that’s a lot of work and it’s easy to miss things. And later if the code changes, we’ll either have to do it all over again, or we’ll have to modify HTML that is hard to read properly as code.
Or we could use a parser that automatically handles everything. Again, using a Java application will be inconvenient for a lot of people, so it would be best to have a WordPress plugin that does this, but here’s what you get for now.
Tags
WordPress Code Helper reads your code line by line, and considers each character. If a < character is encountered, it will look to see if there is a closing > and if one of these strings follows immediately after the <:
pre /pre code /code span /span "a href" /a b /b i /i
If so, we’ll include everything from the < to the > “as is” in the target and skip to the next character after the >. It’s assumed there will be a closing > on the same line if it’s really a tag. If not, the < will be encoded. (I’ve only included the tags I might normally use inside a <pre> block. It wouldn’t be difficult to add more.)
Character Entities
If an ampersand (&) character is encountered, we’ll look for a closing semi-colon to see if we’re dealing with a character entity that we don’t want to change. If the contents are equal to one of these:
amp lt gt nbsp
Or matches the reqular expression:
#[0-9][0-9][0-9]?[0-9]? //e.g. "#4321", "#432", "#43"
Then the parser will likewise not encode the & and will skip to the next character after the semi-colon. Again, these are the only character entities I can personally foresee wanting to use in a <pre> block, and it would be easy to add others.
Else
If a character is not part of a tag or a character entity, we’ll (potentially) encode it using the following method:
private static String encode(char c) {
String str;
switch (c) {
case '&' : str = "&"; break;
case '<' : str = "<"; break;
case '>' : str = ">"; break;
case '\'' : str = "'"; break;
case '"' : str = """; break;
case '\\' : str = "\"; break;
case '-' : str = "-"; break;
case '.' : str = "."; break;
default : str = Character.toString(c);
}
return str;
}
(Although here is a case where the code formatter didn’t help as much: I had to tinker with the numeric character entities so that they show up unencoded and not as the actual character.)
All character entities are created in the target as four digit numeric entities instead of using named entitites. This was a somewhat arbitrary decision. It may help make it clear which characters were automatically encoded versus which may have been manually encoded, assuming you normally use named entities. You can convert backwards from target to source. All of the handled numeric encodings will be replaced with the original character. This might be useful if you want to modify your code later.
Not Now
<-- HTML/XML comments are not currently handled because they’re not something I normally use in blog post code blocks and I didn’t want to expend effort on them up front. -->
Downloads
WordPress Code Helper is licensed with the GNU GPL, version 3 or later.
- Jar file: wp-code-helper.jar (49 KB) (Run with
java -jar wp-code-helper.jar) - Source *.java files: wp-code-helper-src.tgz (16 KB)
- NetBeans project (includes jar and src): wp-code-helper-nb-project.tgz (58 KB)
The class files were compiled with JDK 6, but I’m guessing they’ll compile and run under Java 4. I don’t think I used version 5 or 6 features, although I’m not sure what NetBeans might have used for the generated GUI code. In any case, I’d recommend the latest JVM since I’ve heard that Swing GUI performance has been greatly improved. (Not that this little interface will tax your system.)
Update, 10 Nov 2007: In reading more about NetBeans GUI layouts, I think this project does depend on Java 6 for the GroupLayout that is part of the core libraries. The “Swing Layout Extensions” library supports GroupLayout and is available for Java 5 (and earlier?).
I’ve tested the program with the Java 6 JVM in Ubuntu GNU/Linux and Windows XP.
Related
Comments
-
Hi,
wp-code-helper.jar is really a good help for me while writing code to my blog.
I would like to know that is there any way so that I can get a larger font,as currently when I paste the converted code to my wordpress blog’s CODE part, after publishing it looks to small.
so could you suggest me anything wrt to coding or editing your source code.Regards,
Rajesh.
rajesh4it@gmail.comPosted by rajesh on 5 March 2008 at 9:03 am
You can follow any responses to this entry through the
comments feed.

bookmark with del.icio.us
Richard Stallman:


