Skip to content Skip to sidebar Skip to footer

Regular Expression For Different Forms Of People's Name Representation

I'm writing a python regular expression tries to capture people's names. They can be in the form first_name last_name or last_name, first_name. This is my regular expression for

Solution 1:

You may do what you want with the PyPi regex module only as it allows using the same named capturing groups in the single pattern:

import regex
sz = ["first_name last_name","last_name, first_name"]
for s in sz:
    print(regex.search(r'(?P<first>\w+) (?P<last>\w+)|(?P<last>\w+), (?P<first>\w+)', s).groupdict())
# => {'last': 'last_name', 'first': 'first_name'}# => {'last': 'last_name', 'first': 'first_name'}

See the Python demo.

Else, if your input is always like that, you may swap the first and last name and remove the comma and then just split the string:

name, surname = re.sub(r'^(\w+),\s+(\w+)$', r'\2 \1', s).split()
# => first_name last_name# => first_name last_name

See another Python demo.

Another alternative: use simple numbered capturing groups with a regular alternation, and then concatenate the corresponding captures:

import re
sz = ["first_name last_name","last_name, first_name"]
for s in sz:
    m = re.search(r'(\w+),\s+(\w+)|(\w+)\s+(\w+)', s)
    if m:
        surname = "{}{}".format(m.group(1) or'', m.group(4) or'')
        name = "{}{}".format(m.group(2) or'', m.group(3) or'') 
        print("{} {}".format(name, surname))
    else:
        print("No match")

Here, r'(\w+),\s+(\w+)|(\w+)\s+(\w+)' has last names in Group 1 or 4 and first names in Group 2 or 3, after joining these groups, you get your match (one of them is always None, thus or '' is required when concatenating).

Post a Comment for "Regular Expression For Different Forms Of People's Name Representation"