Regular Expression For Different Forms Of People's Name Representation
I'm writing a python regular expression tries to capture people's names. They can be in the form first_name last_name or last_name, first_name. This is my regular expression for
Solution 1:
You may do what you want with the PyPi regex module only as it allows using the same named capturing groups in the single pattern:
import regex
sz = ["first_name last_name","last_name, first_name"]
for s in sz:
print(regex.search(r'(?P<first>\w+) (?P<last>\w+)|(?P<last>\w+), (?P<first>\w+)', s).groupdict())
# => {'last': 'last_name', 'first': 'first_name'}# => {'last': 'last_name', 'first': 'first_name'}
See the Python demo.
Else, if your input is always like that, you may swap the first and last name and remove the comma and then just split the string:
name, surname = re.sub(r'^(\w+),\s+(\w+)$', r'\2 \1', s).split()
# => first_name last_name# => first_name last_name
See another Python demo.
Another alternative: use simple numbered capturing groups with a regular alternation, and then concatenate the corresponding captures:
import re
sz = ["first_name last_name","last_name, first_name"]
for s in sz:
m = re.search(r'(\w+),\s+(\w+)|(\w+)\s+(\w+)', s)
if m:
surname = "{}{}".format(m.group(1) or'', m.group(4) or'')
name = "{}{}".format(m.group(2) or'', m.group(3) or'')
print("{} {}".format(name, surname))
else:
print("No match")
Here, r'(\w+),\s+(\w+)|(\w+)\s+(\w+)'
has last names in Group 1 or 4 and first names in Group 2 or 3, after joining these groups, you get your match (one of them is always None
, thus or ''
is required when concatenating).
Post a Comment for "Regular Expression For Different Forms Of People's Name Representation"