Unicode in Linux

UTF-8 is a very nice tool that allows easy usage of unicode. Although swapping to UTF-8 has many benefits, the reason I tell people I use the encoding is to write pretty glyphs and the euro (so what if I live in the US?).

However, in Windows, even though every application can pretty much accept UTF-8 as input, it is difficult to input needed characters. Your best bet is to memorize character codes and do Alt + code or swap to the keyboard where such a character is native. I'm not too game for these approaches. I prefer X’s approach of using a configurable Compose key. You press the compose key and then a key chain for a given character. Each key chain was created by some person so as to make as much sense as possible. For example, a pretty open quote, ' “ ', can be typed by Compose + < + ". Likewise, for a close quote, ' ” ', the key chain is Compose + > + ". This allows you to actually remember how to type each character you care about. If that isn't good enough, or if there isn’t a key chain for a character you want, you can configure the chains with the file ~/.XCompose.

There are two large problems with the Compose key, however; the key has no default setting and Linux users for the most part have no earthly clue that such a feature is available to them. It is actually easy (relatively for Linux) to set up the key if you know about it — add “Option "XkbOptions" "compose:menu"” to /etc/X11/xorg.conf, restart X, and you have enabled the feature. The given line will make the compose key the menu key (has a drop-down menu and a mouse pointing to an item on the menu), but there are other allowed keys. In my install, I have two other options: ralt and rwin. To see the allowed settings for yourself, just look at /usr/X11R6/lib/X11/xkb/rules/xorg.lst (assuming you do not specify a XkbRules option; if you do, view the file with with the same rule name as opposed to xorg.lst). The default key bindings change for each locale/charset.

Now the issue is figuring out how to type a certain character. All of the key chains are in a normal configuration file in /usr/X11R6/lib/X11/locale/%LOCALE%/Compose. My locale is en_US.UTF-8. To find your locale you can use the “locale” command. All you really have to do is grep the file for the character in which you are looking. I use “gucharmap” for unicode information and as a normal character map. Here is a script that I found that might make the grepping a little easier (you must edit $dir to fit your system):

#!/bin/sh
dir=/usr/X11R6/lib/X11/locale/en_US.UTF-8
dir="$dir/`sed -n "s#\([^/]*\)/.*:.*$LANG#\1#p" < $dir/Compose`"
grep -F $1 $dir/Compose

If you create a ~/.XCompose file, then you must include the normal key bindings by using “include "%L".” Here is my ~/.XCompose file:

include "%L"
<Multi_key> <minus> <minus> <underscore> : "‐" U2010 # HYPHEN
<Multi_key> <3> <period> : "…" U2026 # HORIZONTAL ELLIPSIS
<Multi_key> <period> <3> : "…" U2026 # HORIZONTAL ELLIPSIS