How To Rermove Non-alphanumeric Characters At The Beginning Or End Of A String

February 03, 2023 Post a Comment

I have a list with elements that have unnecessary (non-alphanumeric) characters at the beginning or end of each string. Ex. 'cats--' I want to get rid of the -- I tried: for i in

Solution 1:

def strip_nonalnum(word):
    if not word:
        return word  # nothing to strip
    for start, c in enumerate(word):
        if c.isalnum():
            break
    for end, c in enumerate(word[::-1]):
        if c.isalnum():
            break
    return word[start:len(word) - end]

print([strip_nonalnum(s) for s in thelist])

import re

def strip_nonalnum_re(word):
    return re.sub(r"^\W+|\W+$", "", word)

Solution 2:

To remove one or more chars other than letters, digits and _ from both ends you may use

re.sub(r'^\W+|\W+$', '', '??cats--') # => cats

Or, if _ is to be removed, too, wrap \W into a character class and add _ there:

re.sub(r'^[\W_]+|[\W_]+$', '', '_??cats--_')

See the regex demo and the regex graph:

See the Python demo:

import re
print( re.sub(r'^\W+|\W+$', '', '??cats--') )          # => cats
print( re.sub(r'^[\W_]+|[\W_]+$', '', '_??cats--_') )  # => cats

Solution 3:

You can use a regex expression. The method re.sub() will take three parameters:

The regex expression
The replacement
The string

Code:

import re

s = 'cats--'
output = re.sub("[^\\w]", "", s)

print output

Explanation:

Baca Juga

The part "\\w" matches any alphanumeric character.
[^x] will match any character that is not x

Solution 4:

I believe that this is the shortest non-regex solution:

text = "`23`12foo--=+"

while len(text) > 0 and not text[0].isalnum():
    text = text[1:]
while len(text) > 0 and not text[-1].isalnum():
    text = text[:-1]

print text

Solution 5:

By using strip you have to know the substring to be stripped.

>>> 'cats--'.strip('-')
'cats'

You could use re to get rid of the non-alphanumeric characters but you would shoot with a cannon on a mouse IMO. With str.isalpha() you can test any strings to contain alphabetic characters, so you only need to keep those:

>>> ''.join(char for char in '#!cats-%' if char.isalpha())
'cats'
>>> thelist = ['cats5--', '#!cats-%', '--the#!cats-%', '--5cats-%', '--5!cats-%']
>>> [''.join(c for c in e if c.isalpha()) for e in thelist]
['cats', 'cats', 'thecats', 'cats', 'cats']

You want to get rid of non-alphanumeric so we can make this better:

>>> [''.join(c for c in e if c.isalnum()) for e in thelist]
['cats5', 'cats', 'thecats', '5cats', '5cats']

This one is exactly the same result you would get with re (as of Christian's answer):

>>> import re
>>> [re.sub("[^\\w]", "", e) for e in thelist]
['cats5', 'cats', 'thecats', '5cats', '5cats']

However, If you want to strip non-alphanumeric characters from the end of the strings only you should use another pattern like this one (check re Documentation):

>>> [''.join(re.search('^\W*(.+)(?!\W*$)(.)', e).groups()) for e in thelist]
['cats5', 'cats', 'the#!cats', '5cats', '5!cats']

Python Guru