FVWM (with UTF-8) loading some very strange character sets

Paapaa · 9 August 2006 18:38

I’m trying to solve two very odd UTF-8 problems now for real. There are two problems either in my config, my X installation or FVWM.

My locale: en_US.UTF-8
Fvwm version: fvwm 2.5.16 compiled on Aug 9 2006 at 00:11:21 with support for: ReadLine, XPM, PNG, Shape, XShm, SM, XRender, XFT, NLS

The 1st problem is that FVWM doesn’t change the “Default Charset” based on the locale setting as described in this incoming bug (from 2005). There was a proposed fix regarding the file libs/FlocaleCharset.c that apparently did something. After applying the fix, my “Default Charset” is now ISO10646-1 as it should be. Before the fix it was ISO8859-1 which is obviously incorrect. The output of “PrintInfo Locale 2” after the fix is pasted below. The output of the same command before the fix is here. This fix also got rid of “invalid byte sequence during conversion from UTF-8 to ISO8859-1” errors I had.

The 2nd problem is that even after the fix FVWM seems to load some very strange character sets which I don’t think should be there. These are: “ISO8859-1 ISO8859-1 JISX0208.1983-0 KSC5601.1987-0 GB2312.1980-0 JISX0201.1976-0”. As you can see there are a Japanese, Korean and Chinese character sets pulled from somewhere. This causes FVWM to give errors about “Missing font charsets” as I have reported earlier. I can get rid of some of these errors by installing Asian font packs, but I consider it a hack, not a fix. The only character set I use in my config files is ISO10646-1 (UTF-8). I have no idea what are these “XOM Charsets” and what is actually loading them all.

I’d like to solve this problem and at least I’d like to know how to start hunting down the reason of this behaviour. If I set my locale to “POSIX” none of this happens: no Asian character sets are loaded and the Default Charset is correctly set to ISO8859-1. Any help is very appreciated - solving this should help all UTF-8 users. I’ll also gladly assist as much as possible - I know C so digging into source is a possibility. If I should post this to some mailing list, please tell me that.

Output of “PrintInfo Locale 2”:

FVWM info on locale:
locale: en_US.UTF-8, Modifier:
Default Charset: X: ISO10646-1, Iconv: UTF-8, Bidi: No
XOM Charsets: ISO8859-1 ISO8859-1 JISX0208.1983-0 KSC5601.1987-0 GB2312.1980-0 JISX0201.1976-0 ISO10646-1
Number of loaded font: 3

Font number 0
fvwm info:
Name: xft:Verdana:size=13:encoding=iso10646-1
Cache count: 2
Type: XftFont
Charset: X: ISO10646-1, Iconv: UTF-8, Bidi: No
height: 19, ascent: 16, descent: 4
shadow size: 0, shadow offset: 0, shadow direction:0
Xft info:

Vertical font: - Rotated font 90: None

Rotated font 270: None

Rotated font 180: None

Font number 1
fvwm info:
Name: xft:bitstream vera sans:size=10:encoding=iso10646-1
Cache count: 1
Type: XftFont
Charset: X: ISO10646-1, Iconv: UTF-8, Bidi: No
height: 14, ascent: 12, descent: 3
shadow size: 0, shadow offset: 0, shadow direction:0
Xft info:

Vertical font: - Rotated font 90: None

Rotated font 270: None

Rotated font 180: None

Font number 2
fvwm info:
Name: --fixed-medium-r-semicondensed--13-------,--fixed-medium-r-normal--14-------,---medium-r-normal--16-------
Cache count: 1
Type: FontSet
Charset: X: ISO10646-1, Iconv: UTF-8, Bidi: No
height: 14, ascent: 12, descent: 2
shadow size: 0, shadow offset: 0, shadow direction:0
X info:
-misc-fixed-medium-r-semicondensed–13-100-100-100-c-60-iso8859-1
-misc-fixed-medium-r-semicondensed–13-100-100-100-c-60-iso8859-1
-misc-fixed-medium-r-normal–14-130-75-75-c-140-jisx0208.1983-0
-misc-fixed-medium-r-normal–14-130-75-75-c-70-jisx0201.1976-0
-misc-fixed-medium-r-semicondensed–13-120-75-75-c-60-iso10646-1

Paapaa · 11 September 2006 12:53

I’ve been trying to figure out what is happening. It really seem X is requesting all those character sets eventhough the user is using UTF-8 locale. FVWM gets this information from X with:

XGetOMValues(om, XNRequiredCharSet, &cs, NULL);

(From libs/FlocaleCharset.c) I still don’t know if this is normal behaviour. I have no idea why those Asian character sets or ISO8859-1 are required because we really are using just ISO10646-1.

The first problem reported in my first message seems to be a real bug as FVWM can’t correctly decide which charset to use as a default when there are more than one possibility. Now FVWM tries to use allways the first one reported by X (here ISO8859-1) eventhough the correct one is the last one ISO10646-1.

So how to fix it? We can make FVWM pick the last charset in the list - as suggested in the original bug report. But is it always the correct one? Like this (libs/FlocaleCharset.c):

if (FLCXOMCharsetList_num > 0 && FLCXOMCharsetList[FLCXOMCharsetList_num-1]) FLCXOMCharset = FLCXOMCharsetList[FLCXOMCharsetList_num-1];

Or there might be special cases for locales which make X request more than one charset. But in what situations X requires more than one charsets? UTF-8 is one case but what are all the others? And in what order X reports the required charsets? Any help or guidance is greatly appreciated. (Or should I post to developers mailing list?)

This was tested on Xorg 7.0.0. I also filed a bug report considering this at freedesktop:

bugs.freedesktop.org/show_bug.cgi?id=8205

morbusg · 11 September 2006 15:52

I dunno if these have any affect on fvwm/X, but have you tried something along the following lines in you profile?

export LC_CTYPE=fi_FI.UTF-8
export LANG=fi_FI.UTF-8

Paapaa · 11 September 2006 17:29

As you can see in my first post I am using locale “en_US.UTF-8”. And yes, they do have an effect in fvwm/X but the effect is now a bit broken