Skip to content Skip to sidebar Skip to footer

What Are All The Illegal_characters From Openpyxl?

We are running into a problem when parsing emails with python from outlook. Sometimes emails have characters that are not able to be appended to an excel worksheet using openpyxl.

Solution 1:

In Regular Expression, or Regex for short, the output you are seeing is an expression of certain characters in a given range. For example:

First part of RE:

[\000-\010]

This means that this set contains any character from 0 to 8 (char codes 0 to 8), which are control characters. You could be getting any character from NULL (�) to BS (backspace).

Second part of RE:

[\013-\014]

Again, this is more control characters. Specifically, characters from 11 to 12 (char code 11 to 12). Which can be from VT or FF. Note that VT is actually tabulation which cannot be printable.

Third part of RE:

[\016-\037]

Now this is a bit more interesting, as this contains both control characters as well as printable characters. So with this being said, you could expect to get any character from 14 to 31 (char code 14 to 31).

So the only logical reason why it cannot print any illegal characters is because the RE that has been provided simply does not entail printable characters. Any ASCII character after 33 is a printable character(32 is the space character), but as you can see here, your code takes everything from \000 to \037. So you're trying to print control characters that aren't printable.

Here is a ASCII table for reference: https://www.w3schools.com/charsets/ref_html_ascii.asp

I hope this helps!

Post a Comment for "What Are All The Illegal_characters From Openpyxl?"