Eric Anderson

Updates

March 19, 2006

This blog has been very lonely for the past few months and I finally am using it again. I finished a few of the tasks that needed to be done a while back, but not all of them.

Most noticably, I broke the RSS feed (I have no clue how!). It works perfectly fine from the command line, but not from apache... Since I don't like RSS that much anyway, I am not too sad about it breaking.

I have now deprecated the Atom 0.3 feed, so that there are no more links to it. I thought that the feed was valid, but checking it at feedvalidator.org shows otherwise. It not being valid combined with Atom 1.0 having had many more months to get implemented into agregators and Atom 0.3 becoming officially deprecated made me feel that there is no reason for me to put forth the effort to have it validate and that I should just deprecate it in general.

I finally fixed the stupid double slash in my URLs and removed the leading space in most links (not a real issue, but it was just code cosmetics). I also changed one thing in the Atom 1.0 feed, although I do not remember what now.

You will find a new Atom 1.0 Summaries. This was going to be in addition to the RSS feed, but since it broke, it will just be a substitute for the RSS feed for those people who like to read posts on the actual site. The summary for a post is now character limited and cut at a word boundry instead of taking the first paragraph.

Those whom are very observant may have noticed that all the CSS files now exist. I split the one stylesheet that I had into several different stylesheets and added a print stylesheet. In addition, I actually created the fixed stylesheet layout, so now the "Choose a style" links work. The original CSS did not completely validate, and this has been fixed as well (so that if I removed the hacks and the Mozilla specific parts it would validate).

Lighttpd

I have tried out and really enjoyed the lighttpd HTTP web server. I hope to eventually swap this server to it some time in the future. The main problem with swapping is I cannot use mod_xslt, so I will need to find or develop a component to perform the XSLT. If you have not tried out lighty, you should.

Unicode in Linux

August 4, 2005

UTF-8 is a very nice tool that allows easy usage of unicode. Although swapping to UTF-8 has many benefits, the reason I tell people I use the encoding is to write pretty glyphs and the euro (so what if I live in the US?).

However, in Windows, even though every application can pretty much accept UTF-8 as input, it is difficult to input needed characters. Your best bet is to memorize character codes and do Alt + code or swap to the keyboard where such a character is native. I'm not too game for these approaches. I prefer X’s approach of using a configurable Compose key. You press the compose key and then a key chain for a given character. Each key chain was created by some person so as to make as much sense as possible. For example, a pretty open quote, ' “ ', can be typed by Compose + < + ". Likewise, for a close quote, ' ” ', the key chain is Compose + > + ". This allows you to actually remember how to type each character you care about. If that isn't good enough, or if there isn’t a key chain for a character you want, you can configure the chains with the file ~/.XCompose.

There are two large problems with the Compose key, however; the key has no default setting and Linux users for the most part have no earthly clue that such a feature is available to them. It is actually easy (relatively for Linux) to set up the key if you know about it — add “Option "XkbOptions" "compose:menu"” to /etc/X11/xorg.conf, restart X, and you have enabled the feature. The given line will make the compose key the menu key (has a drop-down menu and a mouse pointing to an item on the menu), but there are other allowed keys. In my install, I have two other options: ralt and rwin. To see the allowed settings for yourself, just look at /usr/X11R6/lib/X11/xkb/rules/xorg.lst (assuming you do not specify a XkbRules option; if you do, view the file with with the same rule name as opposed to xorg.lst). The default key bindings change for each locale/charset.

Now the issue is figuring out how to type a certain character. All of the key chains are in a normal configuration file in /usr/X11R6/lib/X11/locale/%LOCALE%/Compose. My locale is en_US.UTF-8. To find your locale you can use the “locale” command. All you really have to do is grep the file for the character in which you are looking. I use “gucharmap” for unicode information and as a normal character map. Here is a script that I found that might make the grepping a little easier (you must edit $dir to fit your system):

#!/bin/sh
dir=/usr/X11R6/lib/X11/locale/en_US.UTF-8
dir="$dir/`sed -n "s#\([^/]*\)/.*:.*$LANG#\1#p" < $dir/Compose`"
grep -F $1 $dir/Compose

If you create a ~/.XCompose file, then you must include the normal key bindings by using “include "%L".” Here is my ~/.XCompose file:

include "%L"
<Multi_key> <minus> <minus> <underscore> : "‐" U2010 # HYPHEN
<Multi_key> <3> <period> : "…" U2026 # HORIZONTAL ELLIPSIS
<Multi_key> <period> <3> : "…" U2026 # HORIZONTAL ELLIPSIS

Overview of my Server-Side Processing

August 2, 2005

The base of this blog is a hierarchy of XML files. They are for the most part all named index.xml and all exist in the “http://ersoft.org/blog” namespace. In addition, all the files are saves as UTF-8. A file can consist of several parts, but this is an example that shows all possibilities:

<index title='Pretty Title' time='22:20:00' author='Eric' name='realname'
       xmlns="http://ersoft.org/blog" type='root|group|year|month|day|post'>
	<header>
	   <!-- Nodes will be placed in the header of an XHTML document. -->
	</header>
	<sidebar>
	   <!-- Nodes will be placed in the sidebar of an XHTML document. -->
	</sidebar>

	<listing>
	   <!-- A listing is used to refer to another child xml document -
		such as those used to form the date, time, etc. of a post.
	     -->
		<index>name</index>
		   <!-- The name here is the folder name that the index.xml
			exists in.
		     -->
	</listing>

	<include>/sudo/path/to/file.xml</include>
	   <!-- This is what is used for the recent posts. It includes the xml
		from the file into the current document. No automation to
		determine the most recent posts currently exists.
	     -->

   <!-- Any other nodes will be assumed to be XHTML and be copied to the output
	when needed.
     -->
</index>

This format is very raw and not very imaginative, but it seems to work well. I plan to eventually remove the need for the ‘name’ attribute, but other then that, it should stay pretty consistent.

So that brings me to the XSLT. It is split up into two main files: xhtml.xsl and blog.xsl. xhtml.xsl is the file that has the formatting for my site while blog.xsl has the templates, etc., for any blog. So, in general, when I make infrastructure improvements I only need to modify blog.xsl and when I make presentation improvements I only have to modify xhtml.xsl.

Main Problem Encountered

The main problem that I had was the fact that XSLT 1.0 does not allow for variables to recognized as a node-set if its contents were created on-the-fly. What this meant for me is that each time I needed the root of my blog, I had to recursively find the root and then perform the needed operation. This made me cringe, but I was able to get over the fact. Because I didn't like this fact, I sought a way to avoid it.

The way I am now using to avoiding the problem is saving the path to the root instead of the root node itself. This works because setting a variable to be a string is easy. This is still not perfect, but produces less duplicate “code.”

The best part about this problem, however, is that it is temporary — XSLT 2.0 allows a variable to contain a node-set easily.

Nice Features

There are just a few nice features that I have implemented. The nicest in my opinion is how acronyms and abbrs work. There are two files, acronyms.xml and abbrs.xml, that contain a list of acronyms and abbreviations that I use on my blog. In my XML I only place a the correct acronym or abbr tag around text that needs it, but I do not declare the title or any other attribute. This is done by the XSLT when it comes across either tag and it fills in the attributes given from the two XML files.

Such a feature is commonly done in PHP, however, this has to compare every string against every possible acronym or abbr. In this case however, the XSLT only has to compare every denoted acronym or abbreviation against the known values. Of course, the benefit is a the loss of writer, however, it would require XSLT 2.0 do do differently easily.

The only other nice feature is really being allowed to modify the header and sidebar without having to do PHP in each document when needed. This came in handy on the Recent Posts page.

Atom v. RSS

July 29, 2005

Both Atom and RSS get the job done when I comes to providing feeds of site content to users, but it's decided, Atom wins hands down. Just take a look at RSS 2.0 and Atom 1.0 Compared. I will admit that it may be easier to generate RSS 2.0 because it has fewer requirements (because you get so many standards to choose from - yea!), however, Atom fits very nicely into the XML suite of standards that I have learned, whereas RSS does not.

RSS 2.0 was a much needed 'standard' to combine all the previous RSS versions as well as prepare the way for future formats, like Atom, but it has many shortfalls. It was known to be less than perfect, but instead of creating more confusion with a new version, it locked the standard and declared that any other predecessors should be given a new name. Atom is that new name and has one large shortfall: it is not as widely supported as RSS. However, such a thing is temporary.

Short of not having a namespace (which just urks me), RSS is not intended to include XHTML and shows. That is why I didn't even go there and just have summaries instead of full-text-posts on the feed. Also, the channel element always threw me off - if I squinted, I could see a purpose, but otherwise I found it useless. As a developer using XSLT, RSS made it very difficult to specify times, albeit, a very valid standard of the day. I needed to use EXSLT's format-date function. However, no XSLT processor supports such a thing. So I end up including a XSLT style sheet that implements the function. This isn't that bad, but it hurts to take a 40% performance decrease. What is even better is that I get to put Atom's date format into the format-date function since this is the standard format used in XSLT. Of course, if you are using PHP, you don't care that much, but at least with Atom you don't have to lookup the argument syntax for date().

Atom in general makes much more sense and is more familiar looking then RSS.

First Post - Heck yeah... No php

July 28, 2005

Okay, I have now done something with XSLT that should not have been done. This entire site is generated on-the-fly with XML data files in a folder-oriented heirarchy and XSLT. Currently all text other than the sidebar and footer are completely dynamic.

I checked to see how slow the XSLT was because I was loading many files and such, so on my AMD64 it took .015 seconds (real) to build a page on this blog compared to .075s to do a blank php file. That is nice when you begin running the markup on a server that is 13.5 times slower than your machine (and 32bit). In addition, since libxml2, libxslt, and mod_xslt are so small, they will not require as much RAM (which is nice if you only have 16MiB).

Organization

I use the following list of standards and solutions to form this site:

XHTML 1.1
CSS 2.0 (plus one Mozilla extension)
Javascript
XSLT 1.0
mod_rewrite

Although I am using the XHTML 1.0 doc-type, the markup is based on XHTML 1.1. This is because I have to use the text/html media type. I am using CSS 2.0 to do some of the nicer presentational effects without adding markup. I still have to split up the CSS into multiple files. The Javascript is just to give IE some of the CSS 2.0 features of the site and to swap between a fixed and fluid layout (will be working after I separate the CSS). The XSLT converts the XML files into a page. And mod_rewrite just makes the "Recent Posts" the main page (try /index).

I will need to use php to build a front-end for adding posts, but they are not too much of a pain to do by hand.

When I have more time I will describe more of how it all works and enable a way for the XML files to be viewed raw, but for now just look in /shared and see what you can find.