Python and UTF32/UCS4

You python interpreter maybe compiled with --enable-unocide=ucs2, so that the built-in unichr(i) function will raise an exception, if the given value is larger than 0xFFFF.

How to detect the unicode support in your python build,

import sys
print sys.maxunicode

While the 'ucs2' here actually means utf16, which is a variable length encoding. And you need a simple function to convert utf32/ucs4 to utf16. Here is the example code snippet,

def ucs4chr(codepoint):
    try:
        return unichr(codepoint)
    except ValueError:
        hi, lo = divmod (codepoint-0x10000, 0x400)
        return unichr(0xd800+hi) + unichr(0xdc00+lo)

def ucs4ord(str):
    if len(str)==1:
        return ord(str)
    if len(str)==2:
        hi, lo = ord(str[0])-0xd800, ord(str[1])-0xdc00
        return hi*0x400+0x10000+lo
    raise TypeError("ucs4ord() expected a valid ucs4 character")

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

To submit your comment, click the image below where it asks you to...
Clickcha - The One-Click Captcha