Double-Array-Trie is wildly used in dictionary constructing, and there are some implementations out there, such as, darts (a plain implementation, does not have tail pool), darts-clone, dastrie, libdatrie etc. While for a small trie (like pinyin table), an implementation without tail pool is good enough. I write a simple constructing program in python, you could have a look at the full source here.

It's basically a breadth-first traverse, get the head node from queue, find a proper base for it and mark the check array for its child nodes, then append the child nodes to queue, continue the above processing till the queue is empty.

class DATrie (object): ... ... def find_base (self, s, children, i=1): if s == 0: return 0 if not children: return s while True: for ch in children: k = i + self.encode_character (ch) if self.base[k] or self.check[k] or k == s: i += 1 break else: break return i ... ... def construct_from_trie (self, trie, with_value=True): nodes = [(trie.root, 0)] while nodes: trienode, s = nodes.pop(0) b = self.find_base (s, trienode.trans) self.base[s] = -b if trienode.val else b if with_value: self.value[s] = trienode.val for ch in trienode.trans: c = self.encode_character (ch) t = abs(self.base[s]) + c self.check[t] = s if s else -1 nodes.append ((trienode.trans[ch], t)) for i in xrange (self.encode_character (max(trie.root.trans))+1): if self.check[i] == -1: self.check[i] = 0

For constructing larger dictionaries, I'd recommend you to use darts-clone, dastrie (only for a static trie), or libdatrie (only supports 16-btis value data).

References:

Pingback: What’s New in SunPinyin2 — Quanpin Segmentor