iconv and non-identical/non-reversible conversions

According to the man-page, iconv(3C) would return the number of non-identical (or non-reversible in Gnu's vocabulary), when that happens. But what's the non-identical/non-reversible conversion? Try the following example:

$ echo "abc测试" | iconv -f UTF-8 -t ASCII
abc??

As you can see, the last two characters are out-of scope of ASCII, however they are legal/valid UTF-8 characters. In this case, iconv(3C) shall convert them to another two characters (non-identical-conversion character, in some cases it's '?'), and returns 2. However, Gnu's iconv raises an error (return -1, and set errno to EILSEQ) in this case. Though, according to its manpage, EILSEQ is set to indicate there is an invalid multibyte sequence in the input.

So, it not a portable way to use this method to tell if the target encoding is capable to represent the source contents. And you should not either rely on the non-identical conversion numbers. A successful conversion may return -1 and with E2BIG when output buffer is exhausted, meanwhile non-identical conversions may happen. And there is no flag in iconv(3C)/iconv_open(3C) to control whether to perform non-identical conversions or to raise an error.

In gernal, iconv(3C) is not a well-defined interface.

P.S., this post is a summary of the discussion between JDS/Evolution team and myself, to locate/isolate an iconv(3C) related bug.

glibc on OpenSolaris

Just saw the message on [osol-discuss], that David is working on porting gnu-libc porting to Solaris/OpenSolaris, and made really impressive progress. Likes the efforts for porting glibc to BSD, it makes the GNU/kOpenSolairs (which means OpenSolaris Kernel + GNU userland) to be reality.

Here are the references:

  1. http://csclub.uwaterloo.ca/~dtbartle/opensolaris
  2. https://savannah.nongnu.org/projects/glibc-bsd

python and Gnu-libreadline

You may know that the interactive shell of python uses Gnu-libreadline (in GPL) to implement the history retrieving feature. However, does this impact your own python program or application with python interpreter embedded in?

I did a little study on the python interpreter, libraries and built-in modules on linux (which were built with libreadline), looks like there is only one extension module links to libreadline, /usr/lib/python2.5/lib-dynload/readline.so. And this module would only be loaded by the interactive python shell. When I launch an ordinary python script directly, or embed python interpreter to a C/C++ application, this module would not be loaded, unless the script imports the readline module explicitly. (P.S., I used pmap(1) to verify if libreadline.so is loaded in runtime.)

I think only the readline ext module is under GPL term, and it should be safe to build python with libreadline. :)

Using Gnu-iconv on Solaris and OpenSolaris

Gnu-iconv supports more encoding conversions and provides better performances for some conversions over the Solaris-iconv. E.g., currently, Solaris-iconv does not support the conversions between GB18030 and UCS-2BE, UCS-4LE/BE, UTF-16LE/BE; and the conversions of GB18030<->UTF-8 (UCS-2LE) in Gnu-iconv is two times faster. And Gnu-iconv has a star-shaped structure with some exceptions, which uses UCS-4 as the intermediary encoding. While Solaris-iconv has a peer-2-peer structure (with alias), it's really painful to add a new encoding.

So, you may want to use Gnu-iconv library. For Solaris 10/Nevada, you could download&install gnu-libiconv from www.sunfreeware.com,  for opensolaris, you could install SUNWgnu-libiconv from pkg.opensolaris.org,  but the OS.o package does not contain the header files.

You may notice that, the function symboles in gnu-libiconv, had been added the prefix of "lib", e.g., iconv_open -> libiconv_open. So, LD_PRELOAD and RUNPATH are not sufficient for replacing iconv(3) routines in libc. You need to make sure to include the "iconv.h" from gnu-libiconv.


和Solaris的iconv相比较,Gnu-iconv支持更多的编码转换,并且在某些编码转换上有更好的性能。例如,目前Solaris-iconv不支持从GB18030到UCS-2BE、UCS-4LE/BE和UTF-16LE/BE之间的转换;而GB18030<->UTF-8 (UCS-2LE)在Gnu-iconv中的转换速度,是Solaris-iconv的两倍。并且Gnu-iconv是一种星型结构(也有某些点到点的例外情况),它使用UCS-4作为中间转换的介质。而Solaris-iconv是一种点到点的结构(支持别名),因此添加一个新的编码实在是有些痛苦。

因此,你可能希望使用Gnu-iconv程序库。对Solaris 10/Nevada来说,你可以从www.sunfreeware.com下载并安装gnu-libiconv,对opensolaris你可以用pkg(1)从pkg.opensolaris.org上安装SUNWgnu-libiconv的程序包,不过这个包没有包括头文件。

你可能已经注意到了,gnu-libiconv中的符号名,都被加上了"lib"的前缀,例如iconv_open->libiconv_open。因此LD_PRELOAD和RUNPATH并不能替换libc中的iconv(3)调用。你必须确保include gnu-libiconv中的"iconv.h"头文件。

Exchange changesets of mercurial between repositories/branches

There are several ways to exchange changesets of mercurial between repositories (branches).

  1. hg bundle/unbundle
  2. Just like an offline hg pull/update, which means you could not apply the changesets from another repository if they do not share the same changesets.

  3. hg export/import
  4. hg export -o changesets/patch%r REV1:REV2. It could work for the repositories that does not share the same changesets (repositories may come from different ancestors), but it does not work so well for merged revisions (the changesets have multiple parents). Even with --switch-parent option, you still need to manually edit the patch to eliminate the additional parent declaration in comments.

  5. hg transplant extension
  6. So far the best solution, though I still met a problem when transplanting a specific merged revision (for other merged revisions, it worked smoothly). So I use hg cat -r REV to output the relevant files, copy them to the new repository and commit. Then skip the specific revision.

The only 64bits GtkApp on Solairs

I tried to test the scim/scim-bridge gtk-im-modules for 64bits applications, however, the gnome applications on Solaris are 32bits, except this one,

/usr/demo/bin/{amd64,sparcv9}/gtkdemo.

Unlike Linux, which has different distributions for 32bits and 64bits (for both kernel and userland), Solaris ships them together (the default loaded kernel is 64bits).

《Solaris应用程序设计》书评

受朋友托,为《Solaris应用程序设计》写一个书评。因为只负责翻译了一章,所以算不得自评,呵呵 ...


许多读者看到“某某等”译的书籍,心中对其就分数大减。对这本书,各位看官只管放心。本书是Sun中国工程研究院的若干同事集体翻译的(我承担的是第七章的翻译),同时每一章都有另外两位同事对其进行评审和校订,自信有很好的专业保证和翻译质量。这本书是最新的Solaris应用开发指南,涵盖了各方面的知识且具有相当的深度,的确是不可多得的一本好书。

和大多国内的Solaris/OpenSolaris爱好者(包括同在国内的许多同事)一样,我也是从Linux转入到Solaris开发的。 Solaris的开发环境,包括编译器、调试器以及许多实用工具,与Linux相较都有所不同。回想刚入职时,主要是向前辈请教和自己摸索,知识点亦分散在不同的文档中。而这本书对此有全面的介绍,并且包括了许多最新的内容。试想如果当初有这本书作为入门指导,学习起来一定事半功倍。本书的重心是介绍如何在Solaris上开发高性能的应用。将应用开发中有关性能优化的各个侧面,都一一展现给读者。对于一个严肃的Solaris应用开发程序员,这真是一本必读的开发指南。对于非Solaris平台上应用的性能优化,亦有一定的参考价值。

“我用Linux好好儿的,干嘛要用Solairs/OpenSolaris?”,这可能是你心中一个大大的问号。严格来说,你使用的是 Gnu/Linux系统,Linux只是kernel。你所依赖的,更多的是Gnu系统,而不是Linux kernel。Solairs(特别是OpenSolaris),除了kernel,C库和一些实用程序,和众多的Linux发行版也没太大差别。UltraSparc T1/2 + Solaris/OpenSolairs,对于Web应用的部署和运营来说,真可谓是一个梦幻平台。最新的UltraSparc T2芯片,有8核且每核8线程,操作系统所见的“虚拟”CPU有64个之多,并且内存访问的带宽巨大,而功耗很低,它甚至还是
一个开源的芯片。 X86/64平台上的Solairs,同样也表现不俗,DTrace、ZFS、Virtualization等业界领先的特性,都对你的应用部署和维护,提供了强大的支持。再加上MySql刚刚加入Sun的大家庭,可以预见SAMP平台将有非常好的应用前景。

这本书的翻译,除个别段落有些生涩或小的错误外,总体十分流畅。唯一遗憾的地方是,出版社没有采用“页页对译”的方式,所以附录的索引就丢掉了,对于一本需要时常查阅的工具书来说,十分可惜。这似乎是出版社的惯例,之前的Solaris Kernel Internals也是如此。

Build stardict-3.0.1 on OpenSolaris 2008.05

Before you start the build, make sure you have setup your build environment, you may refer to my blog "Setup Indiana as Developer Desktop for Gnu/Gnome".

Download the source tar file from stardict.sf.net, and apply the patch,


$ patch -p1 < stardict-3.0.1-on-ss12-patch.diff
$ ./autogen.sh --prefix=/usr --disable-festival --disable-espeak; make
# make install

The major problem of this porting, is related to my last entry, "Function Pointer as Template Parameter in SunStudio C++".

Python and UTF32/UCS4

You python interpreter maybe compiled with --enable-unocide=ucs2, so that the built-in unichr(i) function will raise an exception, if the given value is larger than 0xFFFF.

How to detect the unicode support in your python build,

import sys
print sys.maxunicode

While the 'ucs2' here actually means utf16, which is a variable length encoding. And you need a simple function to convert utf32/ucs4 to utf16. Here is the example code snippet,

def ucs4chr(codepoint):
    try:
        return unichr(codepoint)
    except ValueError:
        hi, lo = divmod (codepoint-0x10000, 0x400)
        return unichr(0xd800+hi) + unichr(0xdc00+lo)

def ucs4ord(str):
    if len(str)==1:
        return ord(str)
    if len(str)==2:
        hi, lo = ord(str[0])-0xd800, ord(str[1])-0xdc00
        return hi*0x400+0x10000+lo
    raise TypeError("ucs4ord() expected a valid ucs4 character")

Setup Indiana as Developer Desktop for Gnu/Gnome

After you install Indiana May Release (OpenSolaris 2008.05) in your box, you may need install the following packages to setup your box as a Gnu/Gnome developer desktop:

$ pkg install ss-dev SUNWxwinc SUNWxorg-headers SUNWgnome-common-devel \
SUNWperl-xml-parser SUNWiconv-unicode SUNWiconv-extra SUNWgit

It's a little strange, that the gtk/gnome header files are shipped in the liveCD and got installed by default, but not the X11 headers (in SUNWxwinc).

Besides that, you may still need the CBE. If you already have the installation on other machine, directly copy them to your box could just work. Or, pkgadd(1) is still available to install the SVR4 packages.