double-array-trie and constructing with python

Double-Array-Trie is wildly used in dictionary constructing, and there are some implementations out there, such as, darts (a plain implementation, does not have tail pool), darts-clone, dastrie, libdatrie etc. While for a small trie (like pinyin table), an implementation without tail pool is good enough. I write a simple constructing program in python, you could have a look at the full source here.

It's basically a breadth-first traverse, get the head node from queue, find a proper base for it and mark the check array for its child nodes, then append the child nodes to queue, continue the above processing till the queue is empty.

class DATrie (object):
    ... ...

    def find_base (self, s, children, i=1):
        if s == 0:
            return 0

        if not children:
            return s

        while True:
            for ch in children:
                k = i + self.encode_character (ch)
                if self.base[k] or self.check[k] or k == s:
                    i += 1
                    break
            else:
                break

        return i
    ... ...

    def construct_from_trie (self, trie, with_value=True):
        nodes = [(trie.root, 0)]

        while nodes:
            trienode, s = nodes.pop(0)
            b = self.find_base (s, trienode.trans)
            self.base[s] = -b if trienode.val else b
            if with_value: self.value[s] = trienode.val

            for ch in trienode.trans:
                c = self.encode_character (ch)
                t = abs(self.base[s]) + c
                self.check[t] = s if s else -1

                nodes.append ((trienode.trans[ch], t))

        for i in xrange (self.encode_character (max(trie.root.trans))+1):
            if self.check[i] == -1:
                self.check[i] = 0

For constructing larger dictionaries, I'd recommend you to use darts-clone, dastrie (only for a static trie), or libdatrie (only supports 16-btis value data).

References:

  1. An Implementation of Double-Array Trie
  2. An Efficient Digital Search Algorithm by Using a Double-Array Structure.

One thought on “double-array-trie and constructing with python

  1. Pingback: What’s New in SunPinyin2 — Quanpin Segmentor

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

To submit your comment, click the image below where it asks you to...
Clickcha - The One-Click Captcha