The Power of an Integrated Platform

 

FreeBSD, unlike Linux, is not a kernel, but a complete operating system. This allows a much smoother integration of its components, which is a real boon when you try to locate and fix a problem. The source code for all the parts is all ordered in a single directory tree for you to examine and experiment with.

I came to appreciate this yesterday, when I spent a few hours translating two sentences for the Greek edition of my book Code Quality: The Open Source Perspective. The particular example, part of the section describing internationalization and localization, detailed the locale-specific sorting of German words containing non-ASCII characters. I naturally wanted to replace the example with a Greek equivalent, but the sorting of names I tried refused to follow the Greek collating sequence. Specifically, I would expect letters with a stress to follow their plain equivalent, rather than get placed in the ISO-8859-7 or Unicode order, which has them located at the ends of the alphabet.

The system refused to follow my orders. Maintaining the complete source code for the system is very common in FreeBSD installations, and I find this feature incredibly useful. It was thus easy for me to locate the source code of the file containing the source code of the collating sequence specification for Greek. This showed me that the order I wanted wasn't really specified. For instance, in the following code excerpt, the letters alpha (a*) and alpha with a stress (a%) are specified to follow each other.

# small
        a;...;z;\
        <a%>;<a*>;<b*>;<g*>;<d*>;<e%>;<e*>;<z*>;<y%>;\
        <y*>;<h*>;<i3>;<j*>;<i%>;<i*>;<k*>;<l*>;<m*>;\
        <n*>;<c*>;<o%>;<p*>;<o*>;<r*>;<s*>;<*s>;<t*>;\
        <u3>;<v*>;<u%>;<u*>;<f*>;<x*>;<q*>;<w%>;<w*>;\
I therefore modified it, placing the corresponding letters in the same group, and "recompiled" it.
# small
        a;...;z;\
        (<a%>,<a*>);<b*>;<g*>;<d*>;(<e%>,<e*>);<z*>;(<y%>,\
        <y*>);<h*>;(<i3>,<j*>,<i%>,<i*>);<k*>;<l*>;<m*>;\
        <n*>;<c*>;(<o%>,<o*>);<p*>;<r*>;<s*>;<*s>;<t*>;\
        (<u3>,<v*>,<u%>,<u*>);<f*>;<x*>;<q*>;(<w%>,<w*>);\
(Naturally, the colldef command required for compiling the collation specification into a binary form was part of the system configuration.)

Then, however, the shell sort and ls commands I tried refused to load the new file. (I found that using truss.) My next step was to look at the source code of the setlocale library function. Unfortunately, the code was inscrutable to me; it included things like the following gem.

for (i = 1; r[1] == '/'; ++r)
	;
if (!r[1]) {
	errno = EINVAL;

However, it was easy to recompile the function with debugging enabled, and link it against one of the commands. Stepping through the code I found out that the path I specified for the new location of the collation file through LANG and LC_ALL was too long and the function was failing.

if ((len = r - locale) > ENCODING_LEN) {
	errno = EINVAL;
	return (NULL);
}
Furthermore, looking at the source code of ls I found out that the command was not checking the return value of setlocale, and therefore when it failed it did not report an error message.
        (void)setlocale(LC_ALL, "");
Ten minutes later, again by browsing the source code of setlocale, I understood the purpose of the PATH_LOCALE environment variable, and had the sorting working exactly as I wanted.

All that remains now is to commit my fix to the FreeBSD source tree. By the time the translated book is on the bookshelves, the system will work as advertised!

Comments   Toot! Share


Last modified: Thursday, February 21, 2008 8:15 am

Creative Commons Licence BY NC

Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.