Donkey on Leopard

The latest official build of aMule is 2.1.3, though you could connect to the ED2K/Kad network, you may find that the download speed is very very slow. aMule-cvs is the solution, fortunately a Chinese developer, HDFreeleader, builds the aMule-cvs snapshots weekly.

The newest build is amule-cvs-20080322. And it requires the libjpeg.62.dylib in /usr/local/lib. I recommend you to install libjpeg via MacPorts, and then create a symbolic link there.

How tagging and filtering work for localization installation?

As I knew that IPS supports tagging on files (should be an attribute of 'file' action'?), and from the document I can tell, that tag is a key-value pair, e.g., arch=i386. And Indiana team may prefer to bundle the localization contents, such as messages and online helps, to base package. E.g., the french messages and docs of openoffice would not be shipped as individual packages, but in openoffice@2.3.1 package, so that you may set a filter to install them.

While I'm curious, can I specify the package patterns in a filter? Such as

(arc=i386 | arc=generic) &
    (packages=openoffice & locale=fr & (doc=true | message=true) |
     packages=all-installed & locale=fr & (doc=true | message=true) |
     packages=lang-french-support,ttf-french-fonts,...)

With this filter, I could install the docs and messages for openoffice and all existing packages (may already include oo), as well as the specified packages 'lang-french-support' and 'ttf-french-fonts'.

Certainly, lang-french-support and ttf-french-fonts should also have the locale=fr tag, while I may not want install all the fr l10n contents for un-installed packages.

And I assume that the sections or groups in IPS-Gui are actually filters, would the filters be able to be installed or updated by IPS updating, or enduser could import a filter?

I sent the above questions to pkg-discuss, while no response yet :(

Several tips of porting GNU softwares to Solaris

1. Resolve the dependency of gnu-gettext

In most cases, the gettext(3C) on solaris could fulfill the requirements of your application. You could make following change in configure.in (or configure.ac):

-AM_GNU_GETTEXT
+AM_GLIB_GNU_GETTEXT
+LTLIBINTL=
+AC_SUBST(LTLIBINTL)

The source package may ship with a completed gnu-gettext in its source tree (normally named 'intl'), remove it from the 'SUBDIRS' in the top-level Makefile.am. Sometimes, there is a 'm4' directory in the source tree, contains some macro files for checking gnu libraries or GCC compiler options, remove the option '-I m4' from 'ACLOCAL_AMFLAGS' in the top-level Makefile.am.

Then execute the following steps to update m4 macros and configure script:

glib-gettextize --force
aclocal $ACLOCAL_FLAGS
autoheader
libtoolize -c --automake
automake --add-missing
autoconf

Another note is, the gnu-gettext could not retrieve the localized message compiled by solaris' msgfmt (/usr/bin/msgfmt), but solaris' gettext works fine with the message compiled by gnu's msgfmt.

2. Build socket programs

You may find that the commonly used macro 'SUN_LEN' is not defined in Solaris, add the follow definition in your header file:

+#if defined(sun) && !defined(SUN_LEN)
+#define SUN_LEN(su) (sizeof(*(su)) - sizeof((su)->sun_path) + strlen((su)->sun_path))
+#endif

And before you run configure script, set the LDFLAGS as following:

export LDFLAGS=-lsocket

3. 0-sized array member in C struct

struct Foo {int bar; char data[0];};

-char data[0];
+char data[];    //change the 0-sized array to flexible array

Note, according to C99 standard, the flexible array member could only be placed in the end of a structure. And this change will not impact the layout and size of the original data structure. (Thanks tchaikov for providing the perfect solution!) While, if the 0-sized array member is not on the tail, you may have to use 'union', which requires to change the accessing code.

4. struct initialization

struct point {int x, y, z;};
- struct point x = {x:2, z:3};
+ struct point x = {.x=2, .z=3}; // c99 extension,
not supported
                                 // by sunstudio C++ compiler

5. alloca(3C) on Solaris

You need include alloca.h in your source file where you call alloca(3C).

6. wchar_t

Do NOT assume a wide char is always a UCS4 character. It's true only in UTF-8 locales on Solaris.

7. Using gcc if the source uses too much gcc extensions.

The last choice, /usr/sfw/bin/gcc. The SunStudio C compiler and gcc are compatible in ABI. But C++ compilers are different. If you are building the package on SPARC platform, GCC4SS has better performance than gcc.

Using elfedit(1) to add necessary depended libraries on Solaris

elfedit(1) is a very cool utility to help you modify ELF file, e.g., runpath/rpath. Here I have an real example to add depended libraries:

# elfedit -e 'dyn:value -add -s NEEDED libX11.so.4' /usr/openwin/lib/locale/common/xomCTL.so.2

To verify:

$ elfedit -r -e 'dyn:tag needed' /usr/openwin/lib/locale/common/xomCTL.so.2

Please note (quoted from the manpage) :

The desired string must already exist in the dynamic string table, or
there must be enough reserved space within this section for the new
string to be added. Older objects may not have the extra space. Use the following command to check it out:

elfedit -r -e 'dyn:tag DT_SUNW_STRPAD' file

Python: mmap and array

我需要在Python程序中存取一个很大的数组,数组的每一项是(int, int, float, int)的记录。如果直接用list来存放,占据的内存巨大(因为不仅所有这些数都是对象,且tuple本身也是对象)。Python提供了一个array模块,以更有效地存取数字值,但是它只支持单一的数据类型,例如你无法创建这样的array对象:a = array.array('2lfl')。

我想到了存放在文件中,并用mmap的方式来访问。除了mmap,我不知道Python中是否还有其他方法可以得到一块raw的内存。且mmap在性能和效率上,有一定的优越性。最后,辗转得到了下面的代码:

class MMArray:
    __file = __mem = None
    __realsize = __capsize = 0
    def __init__(self, type='B', fname=None, capsize=1024*1024):
        self.__elmsize = struct.calcsize(type)
        if not fname:
            fno, self.__fname = tempfile.mkstemp("-mmarray", "pyslm-")
            self.__file = os.fdopen (fno, "w+")
            self.__enlarge(capsize)
        else:
            self.fromfile(fname)
    def fromfile(self, fname):
        if not os.path.exists(fname):
            raise "The file '%s' does not exist!"
        fsize = os.path.getsize(fname)
        if fsize == 0:
            raise "The size of file '%s' is zero!" % fname
        if self.__mem: self.__mem.close()
        if self.__file: self.__file.close()
        self.__file = open (fname, "r+")
        self.__mem = mmap.mmap(self.__file.fileno(), fsize)
        self.__realsize = self.__capsize = fsize/self.__elmsize
    def tofile(self, fname):
        if fname == self.__file.name:
            raise "Can not dump the array to currently mapping file!"
        tf = open(fname, "w+")
        bsize = self.__realsize * self.__elmsize
        tf.write (self.__mem[:bsize])
        tf.close()
    def __enlarge(self, capsize):
        if self.__capsize >= capsize:
            return
        self.__capsize = capsize
        self.__file.seek(self.__elmsize * self.__capsize - 1)
        self.__file.write('')
        self.__file.flush()
        if (self.__mem): self.__mem.close()
        self.__mem = mmap.mmap(self.__file.fileno(), self.__file.tell())
    def __del__ (self):
        bsize = self.__realsize * self.__elmsize
        self.__file.truncate (bsize)
        self.__file.close()
        if self.__mem: self.__mem.close()
        os.remove(self.__fname)
    def __getitem__(self, idx):
        if idx < 0 or idx >= self.__realsize:
            raise IndexError
        return self.__access(idx)
    def __setitem__(self, idx, buf):
        if idx < 0 or idx >= self.__realsize:
            raise IndexError
        if type(buf) != type("") or len(buf) != self.__elmsize:
            raise "Not a string, or the buffer size is incorrect!"
        self.__access(idx, buf)
    def __access (self, idx, buf=None):
        start = idx * self.__elmsize
        end = start + self.__elmsize
        if not buf: return self.__mem[start:end]
        self.__mem[start:end] = buf
    def size(self):
        return self.__realsize
    def append(self, buf):
        if type(buf) != type("") or len(buf) != self.__elmsize:
            raise "Not a string, or the buffer size is incorrect!"
        if self.__realsize >= self.__capsize:
            self.__enlarge(self.__capsize*2)
        self.__access(self.__realsize, buf)
        self.__realsize += 1
    def __iter__(self):
        for i in xrange(0, self.__realsize):
            yield self.__access(i)
    def truncate(self, tsize):
        if self.__realsize >= tsize:
            self.__realsize = tsize

当然,还有许多要改进的地方,例如支持从尾部索引(即index<0),以及slicing等等。

The power of Mercurial Queues extension

When I work in a Mercurial local workspace, to fix one bug or add a new feature, I'd like to track my modifications in versioning way, so I did local check-ins on every milestone. While when I push the changes to parent workspace, all the changesets (even some trivial changes, e.g., fixing typo:$) will be saved to parent workspace.

I once used the most stupid way, having two local copies, one is for development, another one is for pushing to parent. Then I found one tip to concatenate multiple changesets into one changeset, but it still needs an extra clone. Finally, I met the MQ extension.

Refer to the tutorial, you would find it's really close to what you want.

Here I list the most important tips:

  • To initialize the patch versioned repository:
  • hg qinit -c

  • To convert all applied MQ patches into permanent changesets: (Note, after the conversion, you may not rollback the last changeset)
  • hg qdelete -r qbase:qtip

  • Synchronizing with upstream (which is much simpler than the steps in MqMerge, and does not introduce extra "merge marker" changeset)
  • hg qpop -a
    hg pull -u
    hg qpush -a

    python-rbtree和内建dict的性能比较


    python
    内建的dict(字典)类使用的是hash算法,因此它的key不是有序的。而C++中的std::map或std::set使用的是平衡二叉树(通常为红黑树),其key是有序的。在网上搜了搜,找到了一个用C和pyrex混合实现的红黑树模块,python-rbtree

    我编写了一个极简单的测试程序,在Solaris x86 + python 2.4.4平台上运行,分别使用dict和rbtree,插入两百万个记录(key是3个整型,value是1个整型,你大概猜到我在干什么了吧 :) )。且在dict插入完之后,调用dict.keys().sort()对其key进行排序(也就是快排)。比较的结果是,两种方法使用的内存相当(大概在200M左右)。但是hash算法的速度要快一倍以上。当记录个数增加到五百万个时,结果还是差不多──即内存使用相当,hash算法快一倍。

    至少在这个数量级上,内建的dict性能更佳。我还尝试了另一个纯Python的红黑树实现--RBTree.py,结果令人失望,在记录个数比较多的情况下,似乎根本无法得到正确的结果。

    结论,python中的dict是可信赖的!

    Google released Android SDK preview


    Google released Android SDK preview today, have a look at the feature overview video. From the application development demonstration, and technical presentations (video 1, 2, and 3), you can see that Java (as a programming language) is almost everywhere, but on a different virtual machine--Dalvik Virtual Machine, a register-based (typically JVM is stack-based) virtual machine that optimized for mobile devices. Another interesting thing is, it supports OpenGL ES 1.0 specification, and the "Surface Manager" could seamlessly composite 2D and 3D graphic layers from multiple applications.

    Here is the architecture of Android platform (click to enlarge):

    You may download the SDK here.