How To Reveal Unicodes Numeric Value Property

July 28, 2023 Post a Comment

'\u00BD' # ½ '\u00B2' # ² I am trying to understand isdecimal() and isdigit() better, for this its necessary to understand unicode numeric value properties. How would I see the

Solution 1:

To get the 'numeric value' contained in the character, you could use unicodedata.numeric() function:

>>>import unicodedata>>>unicodedata.numeric('\u00BD')
0.5

Use the ord() function to get the integer codepoint, optionally in combination with format() to produce a hexadecimal value:

>>>ord('\u00BD')
189
>>>format(ord('\u00BD'), '04x')
'00bd'

You can get access to the character property with unicodedata.category(), which you'd then need to check against the documented categories:

>>> unicodedata('\u00DB')
'No'

where 'No' stands for Number, Other.

However, there are a series of .isnumeric() == True characters in the category Lo; the Python unicodedata database only gives you access to the general category and relies on str.isdigit(), str.isnumeric(), and unicodedata.digit(), unicodedata.numeric(), etc. methods to handle the additional categories.

If you want a precise list of all numeric Unicode characters, the canonical source is the Unicode database; a series of text files that define the whole of the standard. The DerivedNumericTypes.txt file (v. 6.3.0) gives you a 'view' on that database specific the numeric properties; it tells you at the top how the file is derived from other data files in the standard. Ditto for the DerivedNumericValues.txt file, listing the exact numeric value per codepoint.

Solution 2:

the docs explicitly specify the relation between the methods and Numeric_Type property.

defis_decimal(c):
    """Whether input character is Numeric_Type=decimal."""return c.isdecimal() # it means General Category=Decimal Number in Pythondefis_digit(c):
    """Whether input character is Numeric_Type=digit."""return c.isdigit() andnot c.isdecimal()


defis_numeric(c):
    """Whether input character is Numeric_Type=numeric."""return c.isnumeric() andnot c.isdigit() andnot c.isdecimal()

Example:

>>> for c in'\u00BD\u00B2':
... print("{}: Numeric: {}, Digit: {}, Decimal: {}".format(
...         c, is_numeric(c), is_digit(c), is_decimal(c)))
... 
½: Numeric: True, Digit: False, Decimal: False
²: Numeric: False, Digit: True, Decimal: False

I'm not sure Decimal Number and Numeric_Type=Decimal will always be identical.

Note: '\u00B2' is not decimal because superscripts are explicitly excluded by the standard, see 4.6 Numerical Value (Unicode 6.2).

Python Guru

How To Reveal Unicodes Numeric Value Property

Solution 1:

Solution 2:

Post a Comment for "How To Reveal Unicodes Numeric Value Property"