Noob Queries On Unicode And Str Methods In Python
Solution 1:
First we should be clear we are talking about Python 2 only. Python 3 is different.
- You're right. But if you write u"abcd" in a py file, the declaration of the encoding of the source file will determine how the interpreter decode you string.
You need to decode it first, and then encode it and print. In Python 2, DON'T print out unicode directly! Otherwise, if the system is encoding it in an incompatitable way (like "ascii"), an exception will be raised. You have to do all these explicitly.
The short answer is "a" doesn't have to be represented in "\x61", "a" is simply more readable. A longer answer: typically in the interactive shell, if you type a value and press enter, Python will show the repr() of your string. I think "repr" will try to print everything in ascii representation. For "a", it's already ascii, so it's outputed directly. For str "é", it's UTF-8 encoded binary stream, so Python escape each byte and print as 'xc3\xa9'
Solution 2:
I don't think Python does any automatic encoding or decoding on console I/O. Consider the following:
>>> 'é'
'\xc3\xa9'
>>> 'é'.decode('UTF-8')
u'\xe9'
You'll notice that \xe9
is the Unicode code point for 'LATIN SMALL LETTER E WITH ACUTE', while \xc3\xa9
is the byte sequence corresponding to the same character in UTF-8.
Everything changes in Python 3, since all strings are Unicode. I'm not sure of the rules there.
Solution 3:
See http://www.python.org/dev/peps/pep-0263/ about how to specify encoding of Python source file. For Python interpreter there's PYTHONIOENCODING environment variable.
What OS do you use?
Solution 4:
- The statement
word = u'foo'
assigns a unicode string object, not a "hex representation". Unicode objects represent sequences of text characters. Also, it is wrong to think of decoding in this context. Unicode is not an encoding, nor does it "have" an encoding. - Yes. Decode In: Encode Out.
- For the
repr
of a non-unicode string literal, Python will usesys.stdin.encoding
; for therepr
of a unicode string literal, Python will use "unicode_escape".
Post a Comment for "Noob Queries On Unicode And Str Methods In Python"