How To Build A Regular Vocabulary Of Emoticons In Python?
I have a list of codes of emoticons inside a file UTF32.red.codes in plain text. The plain content of the file is \U0001F600 \U0001F601 \U0001F602 \U0001F603 \U0001F604 \U0001F605
Solution 1:
Your code will be doing fine in python 3 (just fix print found
to print(found)
). However, in python 2.7 it won't work, as its re
module has a known bug (See this thread and this issue).
If you still need python 2 version of code, just use regex
module, which could be installed with pip2 install regex
. Import it with import regex
then, substitute all re.
statements with regex.
(i.e. regex.compile
and regex.findall
) and that's it. It should be working.
Solution 2:
This code works with python 2.7
import re
with open('UTF32.red.codes','rb') as emof:
codes = [emo.decode('unicode-escape').strip() for emo in emof]
emojis = re.compile(u"(%s)" % "|".join(map(re.escape,codes)))
search = ur'string to check \U0001F601'
found = emojis.findall(search)
print found
Post a Comment for "How To Build A Regular Vocabulary Of Emoticons In Python?"