2009 ERI Chinese Near Year Party

一年到头,用上70-200的机会不多,新年晚会自然不可放过。不过好事多磨,我的小黑在家用得好好的,到单位就发现有时无法自动对焦了,搞得我前面若干张都是手动对焦的,糊的比较多。后来用着用着自己又好了,看来我的小黑和D700有些不和谐啊,郁闷...

double-array-trie and constructing with python

Double-Array-Trie is wildly used in dictionary constructing, and there are some implementations out there, such as, darts (a plain implementation, does not have tail pool), darts-clone, dastrie, libdatrie etc. While for a small trie (like pinyin table), an implementation without tail pool is good enough. I write a simple constructing program in python, you could have a look at the full source here.

It's basically a breadth-first traverse, get the head node from queue, find a proper base for it and mark the check array for its child nodes, then append the child nodes to queue, continue the above processing till the queue is empty.

class DATrie (object):
    ... ...
    def find_base (self, s, children, i=1):
        if s == 0:
            return 0
        if not children:
            return s
        while True:
            for ch in children:
                k = i + self.encode_character (ch)
                if self.base[k] or self.check[k] or k == s:
                    i += 1
                    break
            else:
                break
        return i
    ... ...
    def construct_from_trie (self, trie, with_value=True):
        nodes = [(trie.root, 0)]
        while nodes:
            trienode, s = nodes.pop(0)
            b = self.find_base (s, trienode.trans)
            self.base[s] = -b if trienode.val else b
            if with_value: self.value[s] = trienode.val
            for ch in trienode.trans:
                c = self.encode_character (ch)
                t = abs(self.base[s]) + c
                self.check[t] = s if s else -1
                nodes.append ((trienode.trans[ch], t))
        for i in xrange (self.encode_character (max(trie.root.trans))+1):
            if self.check[i] == -1:
                self.check[i] = 0

For constructing larger dictionaries, I'd recommend you to use darts-clone, dastrie (only for a static trie), or libdatrie (only supports 16-btis value data).

References:

  1. An Implementation of Double-Array Trie
  2. An Efficient Digital Search Algorithm by Using a Double-Array Structure.

08总结和09展望

自我总结一下,08年上半年的状态尚可,但下半年比较糟糕,不时会出现莫名其妙的失误,并且发现难于集中自己的注意力;除了bug fixing之外,工作上的进展也不理想。09年要做的就是两个字:专注!要集中注意力将SunPinyin的后续开发做好,并在NLP领域内继续学习和提高自己。