<?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
> <channel><title>Comments on: 导入sogou输入法的细胞词库</title> <atom:link href="http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/feed/" rel="self" type="application/rss+xml" /><link>http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=%25e5%25af%25bc%25e5%2585%25a5sogou%25e8%25be%2593%25e5%2585%25a5%25e6%25b3%2595%25e7%259a%2584%25e7%25bb%2586%25e8%2583%259e%25e8%25af%258d%25e5%25ba%2593</link> <description>Yong Sun&#039;s Blog</description> <lastBuildDate>Wed, 08 Feb 2012 06:05:17 +0000</lastBuildDate> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3</generator> <item><title>By: yongsun</title><link>http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/comment-page-2/#comment-3525</link> <dc:creator>yongsun</dc:creator> <pubDate>Mon, 21 Nov 2011 01:23:44 +0000</pubDate> <guid
isPermaLink="false">http://yongsun.me/?p=1496#comment-3525</guid> <description>@大茶几の骨骸，遇到重复的词应该会跳过</description> <content:encoded><![CDATA[<p>@大茶几の骨骸，遇到重复的词应该会跳过</p> ]]></content:encoded> </item> <item><title>By: 大茶几の骨骸</title><link>http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/comment-page-2/#comment-3520</link> <dc:creator>大茶几の骨骸</dc:creator> <pubDate>Fri, 18 Nov 2011 16:18:22 +0000</pubDate> <guid
isPermaLink="false">http://yongsun.me/?p=1496#comment-3520</guid> <description>请问遇到重复的词是做覆盖处理吗?</description> <content:encoded><![CDATA[<p>请问遇到重复的词是做覆盖处理吗?</p> ]]></content:encoded> </item> <item><title>By: yongsun</title><link>http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/comment-page-2/#comment-3373</link> <dc:creator>yongsun</dc:creator> <pubDate>Fri, 22 Jul 2011 12:26:41 +0000</pubDate> <guid
isPermaLink="false">http://yongsun.me/?p=1496#comment-3373</guid> <description>@marissa，您是在linux上使用ibus-sunpinyin还是fcitx-sunpinyin？貌似是没有在您的HOME目录下找到sunpinyin的用户词典…</description> <content:encoded><![CDATA[<p>@marissa，您是在linux上使用ibus-sunpinyin还是fcitx-sunpinyin？貌似是没有在您的HOME目录下找到sunpinyin的用户词典…</p> ]]></content:encoded> </item> <item><title>By: marissa</title><link>http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/comment-page-2/#comment-3372</link> <dc:creator>marissa</dc:creator> <pubDate>Thu, 21 Jul 2011 02:51:36 +0000</pubDate> <guid
isPermaLink="false">http://yongsun.me/?p=1496#comment-3372</guid> <description>Traceback (most recent call last):
File &quot;import_sogou_celldict.py&quot;, line 81, in
main()
File &quot;import_sogou_celldict.py&quot;, line 78, in main
import_to_sunpinyin_user_dict (generator)
File &quot;/home/marissa/work/seg/sunpinyin_importer/importer.py&quot;, line 55, in import_to_sunpinyin_user_dict
userdict_path = userdict_path if userdict_path else get_userdict_path()
File &quot;/home/marissa/work/seg/sunpinyin_importer/importer.py&quot;, line 23, in get_userdict_path
raise &quot;Can not detect sunpinyin&#039;s userdict!&quot;
TypeError: exceptions must be old-style classes or derived from BaseException, not str</description> <content:encoded><![CDATA[<p>Traceback (most recent call last):<br
/> File "import_sogou_celldict.py", line 81, in<br
/> main()<br
/> File "import_sogou_celldict.py", line 78, in main<br
/> import_to_sunpinyin_user_dict (generator)<br
/> File "/home/marissa/work/seg/sunpinyin_importer/importer.py", line 55, in import_to_sunpinyin_user_dict<br
/> userdict_path = userdict_path if userdict_path else get_userdict_path()<br
/> File "/home/marissa/work/seg/sunpinyin_importer/importer.py", line 23, in get_userdict_path<br
/> raise "Can not detect sunpinyin's userdict!"<br
/> TypeError: exceptions must be old-style classes or derived from BaseException, not str</p> ]]></content:encoded> </item> <item><title>By: yongsun</title><link>http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/comment-page-2/#comment-3182</link> <dc:creator>yongsun</dc:creator> <pubDate>Tue, 19 Apr 2011 05:42:38 +0000</pubDate> <guid
isPermaLink="false">http://yongsun.me/?p=1496#comment-3182</guid> <description>@Isalahberlin 有可能是脚本的错误，你能把导入的词库文件（url）告诉我么？</description> <content:encoded><![CDATA[<p>@Isalahberlin 有可能是脚本的错误，你能把导入的词库文件（url）告诉我么？</p> ]]></content:encoded> </item> <item><title>By: Wao</title><link>http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/comment-page-2/#comment-3181</link> <dc:creator>Wao</dc:creator> <pubDate>Mon, 18 Apr 2011 23:09:50 +0000</pubDate> <guid
isPermaLink="false">http://yongsun.me/?p=1496#comment-3181</guid> <description>要是支持搜狗主词库的导入就好了~</description> <content:encoded><![CDATA[<p>要是支持搜狗主词库的导入就好了~</p> ]]></content:encoded> </item> <item><title>By: Isalahberlin</title><link>http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/comment-page-2/#comment-2992</link> <dc:creator>Isalahberlin</dc:creator> <pubDate>Fri, 18 Mar 2011 02:32:05 +0000</pubDate> <guid
isPermaLink="false">http://yongsun.me/?p=1496#comment-2992</guid> <description>这些导入的脚本很好用，但我还有三个问题，请大家解答：
1.我在用sogou词库导入的py脚本的时候，出现了下面的报错信息，我到原词库里面看了，原来的词条和其他正常的词条并没有什么特别（这只是一部分，还有很多）:
[ong&#039;neng&#039;jian] has un-recognized syllables, ignoring this record!
[i&#039;chan] has un-recognized syllables, ignoring this record!
[i&#039;sai&#039;ya&#039;bo&#039;lin] has un-recognized syllables, ignoring this record!
[hi&#039;wan&#039;fan] has un-recognized syllables, ignoring this record!
[iu&#039;mei&#039;xie] has un-recognized syllables, ignoring this record!
[u&#039;an&#039;zhao] has un-recognized syllables, ignoring this record!
[iang&#039;hang] has un-recognized syllables, ignoring this record!
[i&#039;li&#039;qie&#039;fu] has un-recognized syllables, ignoring this record!
原词库里面的词条:
gong’neng’jian 功能键
zi’chan 自产
yi’sai’ya’bo’lin 以塞亚伯林
chi’wan’fan 吃完饭
jiu’mei’xie 就没写
bu’an’zhao 不按照
liang’hang 两行
di’li’qie’fu 蒂里切夫
很明显，在报错里面，把这些词条的首字母给切掉了，那么为什么这些词条会提示说里面有无法识别的音节呢?而其他的词条却没事呢?这些词条看起来和其他没并被切的词条没有任何区别啊？
2.关于词库重复导入的问题，如果我忘了先前已经导入了某个词库，又一次把这个词库导入到userdict里面去了，这样会不会导致userdict里面增加垃圾数据?无端的增加userdict的大小?
3.我发现一个很诡异的现象，我的mac里面，userdict这个文件的大小有的时候的变化会让人捉摸不定，比如我在导入30个词库之前，他的大小还是100M，可以我导入了30个词库之后，他的大小反而变小了，一下子变成70M了———这是怎么回事儿?</description> <content:encoded><![CDATA[<p>这些导入的脚本很好用，但我还有三个问题，请大家解答：</p><p>1.我在用sogou词库导入的py脚本的时候，出现了下面的报错信息，我到原词库里面看了，原来的词条和其他正常的词条并没有什么特别（这只是一部分，还有很多）:</p><p>[ong'neng'jian] has un-recognized syllables, ignoring this record!<br
/> [i'chan] has un-recognized syllables, ignoring this record!<br
/> [i'sai'ya'bo'lin] has un-recognized syllables, ignoring this record!<br
/> [hi'wan'fan] has un-recognized syllables, ignoring this record!<br
/> [iu'mei'xie] has un-recognized syllables, ignoring this record!<br
/> [u'an'zhao] has un-recognized syllables, ignoring this record!<br
/> [iang'hang] has un-recognized syllables, ignoring this record!<br
/> [i'li'qie'fu] has un-recognized syllables, ignoring this record!</p><p>原词库里面的词条:</p><p>gong’neng’jian 功能键<br
/> zi’chan 自产<br
/> yi’sai’ya’bo’lin 以塞亚伯林<br
/> chi’wan’fan 吃完饭<br
/> jiu’mei’xie 就没写<br
/> bu’an’zhao 不按照<br
/> liang’hang 两行<br
/> di’li’qie’fu 蒂里切夫</p><p>很明显，在报错里面，把这些词条的首字母给切掉了，那么为什么这些词条会提示说里面有无法识别的音节呢?而其他的词条却没事呢?这些词条看起来和其他没并被切的词条没有任何区别啊？</p><p>2.关于词库重复导入的问题，如果我忘了先前已经导入了某个词库，又一次把这个词库导入到userdict里面去了，这样会不会导致userdict里面增加垃圾数据?无端的增加userdict的大小?</p><p>3.我发现一个很诡异的现象，我的mac里面，userdict这个文件的大小有的时候的变化会让人捉摸不定，比如我在导入30个词库之前，他的大小还是100M，可以我导入了30个词库之后，他的大小反而变小了，一下子变成70M了———这是怎么回事儿?</p> ]]></content:encoded> </item> <item><title>By: Isalahberlin</title><link>http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/comment-page-2/#comment-2957</link> <dc:creator>Isalahberlin</dc:creator> <pubDate>Wed, 16 Mar 2011 06:47:26 +0000</pubDate> <guid
isPermaLink="false">http://yongsun.me/?p=1496#comment-2957</guid> <description>FIT，还是别提了，你们怎么选了这么一群猪一样的合作伙伴？太难用了，很怀疑那些人是不是专业出身的，用着sunpinyin的核心，词频让他们玩儿成那样，不管多生僻的词都放在第一个。。。软件响应慢的要死，弄个输入法还动不动就风火轮cpu占用90%。。。。。。他们那词库导入工具垃圾的要死，比lz的这些py差太远了。</description> <content:encoded><![CDATA[<p>FIT，还是别提了，你们怎么选了这么一群猪一样的合作伙伴？太难用了，很怀疑那些人是不是专业出身的，用着sunpinyin的核心，词频让他们玩儿成那样，不管多生僻的词都放在第一个。。。软件响应慢的要死，弄个输入法还动不动就风火轮cpu占用90%。。。。。。他们那词库导入工具垃圾的要死，比lz的这些py差太远了。</p> ]]></content:encoded> </item> <item><title>By: Isalahberlin</title><link>http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/comment-page-2/#comment-2956</link> <dc:creator>Isalahberlin</dc:creator> <pubDate>Wed, 16 Mar 2011 06:42:07 +0000</pubDate> <guid
isPermaLink="false">http://yongsun.me/?p=1496#comment-2956</guid> <description>另外关于我上上个提问，谢谢老大的回答，我已经解决了。。。。。。。。。</description> <content:encoded><![CDATA[<p>另外关于我上上个提问，谢谢老大的回答，我已经解决了。。。。。。。。。</p> ]]></content:encoded> </item> <item><title>By: Isalahberlin</title><link>http://yongsun.me/2010/07/%e5%af%bc%e5%85%a5sogou%e8%be%93%e5%85%a5%e6%b3%95%e7%9a%84%e7%bb%86%e8%83%9e%e8%af%8d%e5%ba%93/comment-page-2/#comment-2955</link> <dc:creator>Isalahberlin</dc:creator> <pubDate>Wed, 16 Mar 2011 06:41:09 +0000</pubDate> <guid
isPermaLink="false">http://yongsun.me/?p=1496#comment-2955</guid> <description>这些导入的脚本很好用，但我还有三个问题，请大家解答：
1.我在用sogou词库导入的py脚本的时候，出现了下面的报错信息，我到原词库里面看了，原来的词条和其他正常的词条并没有什么特别（这只是一部分，还有很多）:
[ong&#039;neng&#039;jian] has un-recognized syllables, ignoring this record!
[i&#039;chan] has un-recognized syllables, ignoring this record!
[i&#039;sai&#039;ya&#039;bo&#039;lin] has un-recognized syllables, ignoring this record!
[hi&#039;wan&#039;fan] has un-recognized syllables, ignoring this record!
[iu&#039;mei&#039;xie] has un-recognized syllables, ignoring this record!
[u&#039;an&#039;zhao] has un-recognized syllables, ignoring this record!
[iang&#039;hang] has un-recognized syllables, ignoring this record!
[i&#039;li&#039;qie&#039;fu] has un-recognized syllables, ignoring this record!
原词库里面的词条:
gong&#039;neng&#039;jian 功能键
zi&#039;chan 自产
yi&#039;sai&#039;ya&#039;bo&#039;lin 以塞亚伯林
chi&#039;wan&#039;fan 吃完饭
jiu&#039;mei&#039;xie 就没写
bu&#039;an&#039;zhao 不按照
liang&#039;hang 两行
di&#039;li&#039;qie&#039;fu 蒂里切夫
很明显，在报错里面，把这些词条的首字母给切掉了，那么为什么这些词条会提示说里面有无法识别的音节呢?而其他的词条却没事呢?这些词条看起来和其他没并被切的词条没有任何区别啊？
2.关于词库重复导入的问题，如果我忘了先前已经导入了某个词库，又一次把这个词库导入到userdict里面去了，这样会不会导致userdict里面增加垃圾数据?无端的增加userdict的大小?
3.我发现一个很诡异的现象，我的mac里面，userdict这个文件的大小有的时候的变化会让人捉摸不定，比如我在导入30个词库之前，他的大小还是100M，可以我导入了30个词库之后，他的大小反而变小了，一下子变成70M了---------这是怎么回事儿?</description> <content:encoded><![CDATA[<p>这些导入的脚本很好用，但我还有三个问题，请大家解答：</p><p>1.我在用sogou词库导入的py脚本的时候，出现了下面的报错信息，我到原词库里面看了，原来的词条和其他正常的词条并没有什么特别（这只是一部分，还有很多）:</p><p>[ong'neng'jian] has un-recognized syllables, ignoring this record!<br
/> [i'chan] has un-recognized syllables, ignoring this record!<br
/> [i'sai'ya'bo'lin] has un-recognized syllables, ignoring this record!<br
/> [hi'wan'fan] has un-recognized syllables, ignoring this record!<br
/> [iu'mei'xie] has un-recognized syllables, ignoring this record!<br
/> [u'an'zhao] has un-recognized syllables, ignoring this record!<br
/> [iang'hang] has un-recognized syllables, ignoring this record!<br
/> [i'li'qie'fu] has un-recognized syllables, ignoring this record!</p><p>原词库里面的词条:</p><p>gong'neng'jian 功能键<br
/> zi'chan 自产<br
/> yi'sai'ya'bo'lin 以塞亚伯林<br
/> chi'wan'fan 吃完饭<br
/> jiu'mei'xie 就没写<br
/> bu'an'zhao 不按照<br
/> liang'hang 两行<br
/> di'li'qie'fu 蒂里切夫</p><p>很明显，在报错里面，把这些词条的首字母给切掉了，那么为什么这些词条会提示说里面有无法识别的音节呢?而其他的词条却没事呢?这些词条看起来和其他没并被切的词条没有任何区别啊？</p><p>2.关于词库重复导入的问题，如果我忘了先前已经导入了某个词库，又一次把这个词库导入到userdict里面去了，这样会不会导致userdict里面增加垃圾数据?无端的增加userdict的大小?</p><p>3.我发现一个很诡异的现象，我的mac里面，userdict这个文件的大小有的时候的变化会让人捉摸不定，比如我在导入30个词库之前，他的大小还是100M，可以我导入了30个词库之后，他的大小反而变小了，一下子变成70M了---------这是怎么回事儿?</p> ]]></content:encoded> </item> </channel> </rss>
