Skip to content Skip to sidebar Skip to footer

How To Read Large File With Unicode In Python 3

Hello i have a large file that contain unicode characters, and when i try to open it in Python 3 this is the mistake i have. File 'addRNC.py', line 47, in add_rnc() File 'a

Solution 1:

Your file actually contains invalid UTF-8.

When you say "contains unicode characters", you should be aware that Unicode doesn't specify how the characters are represented. So even if the file represents Unicode data, it could be in UTF-8, UTF-16 (UTF-16BE or UTF-16LE, each with or without a BOM), the deprecated UCS-2, or perhaps even one of the more esoteric forms...

Double check that the file is valid; I'd bet that you indeed have a byte 0xD3 (11010011), which must in UTF-8 be the leading byte of a two-byte character, in a follower position (in other words, 0xD3 immediately follows a byte whose binary representation begins with 11 [is greater than 0xC0]).

The most likely reason for this is that your file contains non-ASCII characters, but isn't in UTF-8.

Post a Comment for "How To Read Large File With Unicode In Python 3"