List Of Unicode Character Names
Solution 1:
Every codepoint has a name, so you are effectively asking for the Unicode standard list of codepoint names (as well as the *list of name aliases, supported by Python 3.3 and up).
Each Python version supports a specific version of the Unicode standard; the unicodedata.unidata_version
attribute tells you which one for a given Python runtime. The above links lead to the latest published Unicode version, replace UCD/latest
in the URLs with the value of unicodedata.unidata_version
for your Python version.
Per codepoint, the unicodedata.name()
function can tell you the official name, and unicodedata.lookup()
gives you the inverse (name to codepoint).
Solution 2:
If you want a list of all unicode character names, consider downloading the Unicode Character Database.
It is included in the base repositories of many linux distributions (ex. "unicode-ucd" on RHEL).
The package includes NamesList.txt, which contains the exhaustive list of unicode character names.
Caution: NamesList.txt
need some times to be downloaded (size > 1.5 MB).
Example:
21FE RIGHTWARDS OPEN-HEADED ARROW
21FF LEFT RIGHT OPEN-HEADED ARROW
@@ 2200 Mathematical Operators 22FF
@@+
@ Miscellaneous mathematical symbols
2200 FOR ALL
= universal quantifier
2201 COMPLEMENT
x (latin letter stretched c - 0297)
2202 PARTIAL DIFFERENTIAL
2203 THERE EXISTS
= existential quantifier
2204 THERE DOES NOT EXIST
: 2203 0338
2205 EMPTY SET
= null set
* used in linguistics to indicate a null morpheme or phonological "zero"
x (latin capital letter o with stroke - 00D8)
x (diameter sign - 2300)
~ 2205 FE00 zero with long diagonal stroke overlay form
Solution 3:
Yes there is a way. Going through all existing code points and calling unicodedata.name()
on each of them. Like this:
names = []
for c in range(0, 0x10FFFF + 1):
try:
names.append(unicodedata.name(c))
except KeyError:
pass
# Do something with names
Solution 4:
For a given codepoint, you can use unicodedata.name
. To get them all, you can work through all the billions to see which have such names.
Solution 5:
Just print them all:
import unicodedata
for i in range(0x110000):
character = chr(i)
name = unicodedata.name(character, "")
if len(name) > 0:
print(f"{i:6} | 0x{i:04X} | {character} | {name}")
Post a Comment for "List Of Unicode Character Names"