Skip to content Skip to sidebar Skip to footer

How To Build A Regular Vocabulary Of Emoticons In Python?

I have a list of codes of emoticons inside a file UTF32.red.codes in plain text. The plain content of the file is \U0001F600 \U0001F601 \U0001F602 \U0001F603 \U0001F604 \U0001F605

Solution 1:

Your code will be doing fine in python 3 (just fix print found to print(found)). However, in python 2.7 it won't work, as its re module has a known bug (See this thread and this issue).

If you still need python 2 version of code, just use regex module, which could be installed with pip2 install regex. Import it with import regex then, substitute all re. statements with regex. (i.e. regex.compile and regex.findall) and that's it. It should be working.


Solution 2:

This code works with python 2.7

import re
with open('UTF32.red.codes','rb') as emof:
    codes = [emo.decode('unicode-escape').strip() for emo in emof]
    emojis = re.compile(u"(%s)" % "|".join(map(re.escape,codes)))

search = ur'string to check \U0001F601'
found = emojis.findall(search)

print found

Post a Comment for "How To Build A Regular Vocabulary Of Emoticons In Python?"