I’m trying to solve two very odd UTF-8 problems now for real. There are two problems either in my config, my X installation or FVWM.
My locale: en_US.UTF-8
Fvwm version: fvwm 2.5.16 compiled on Aug 9 2006 at 00:11:21
with support for: ReadLine, XPM, PNG, Shape, XShm, SM, XRender, XFT, NLS
The 1st problem is that FVWM doesn’t change the “Default Charset” based on the locale setting as described in this incoming bug (from 2005). There was a proposed fix regarding the file libs/FlocaleCharset.c that apparently did something. After applying the fix, my “Default Charset” is now ISO10646-1 as it should be. Before the fix it was ISO8859-1 which is obviously incorrect. The output of “PrintInfo Locale 2” after the fix is pasted below. The output of the same command before the fix is here. This fix also got rid of “invalid byte sequence during conversion from UTF-8 to ISO8859-1” errors I had.
The 2nd problem is that even after the fix FVWM seems to load some very strange character sets which I don’t think should be there. These are: “ISO8859-1 ISO8859-1 JISX0208.1983-0 KSC5601.1987-0 GB2312.1980-0 JISX0201.1976-0”. As you can see there are a Japanese, Korean and Chinese character sets pulled from somewhere. This causes FVWM to give errors about “Missing font charsets” as I have reported earlier. I can get rid of some of these errors by installing Asian font packs, but I consider it a hack, not a fix. The only character set I use in my config files is ISO10646-1 (UTF-8). I have no idea what are these “XOM Charsets” and what is actually loading them all.
I’d like to solve this problem and at least I’d like to know how to start hunting down the reason of this behaviour. If I set my locale to “POSIX” none of this happens: no Asian character sets are loaded and the Default Charset is correctly set to ISO8859-1. Any help is very appreciated - solving this should help all UTF-8 users. I’ll also gladly assist as much as possible - I know C so digging into source is a possibility. If I should post this to some mailing list, please tell me that.
I’ve been trying to figure out what is happening. It really seem X is requesting all those character sets eventhough the user is using UTF-8 locale. FVWM gets this information from X with:
XGetOMValues(om, XNRequiredCharSet, &cs, NULL);
(From libs/FlocaleCharset.c) I still don’t know if this is normal behaviour. I have no idea why those Asian character sets or ISO8859-1 are required because we really are using just ISO10646-1.
The first problem reported in my first message seems to be a real bug as FVWM can’t correctly decide which charset to use as a default when there are more than one possibility. Now FVWM tries to use allways the first one reported by X (here ISO8859-1) eventhough the correct one is the last one ISO10646-1.
So how to fix it? We can make FVWM pick the last charset in the list - as suggested in the original bug report. But is it always the correct one? Like this (libs/FlocaleCharset.c):
if (FLCXOMCharsetList_num > 0 && FLCXOMCharsetList[FLCXOMCharsetList_num-1])
FLCXOMCharset = FLCXOMCharsetList[FLCXOMCharsetList_num-1];
Or there might be special cases for locales which make X request more than one charset. But in what situations X requires more than one charsets? UTF-8 is one case but what are all the others? And in what order X reports the required charsets? Any help or guidance is greatly appreciated. (Or should I post to developers mailing list?)
This was tested on Xorg 7.0.0. I also filed a bug report considering this at freedesktop: