Python Regex, Find And Replace Second Tab Character

December 12, 2022 Post a Comment

I am trying to find and replace the second tab character in a string using regex. booby = 'Joe Bloggs\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n' This works fine: re.sub(

Solution 1:

You may be overthinking it a little.

>>> text = 'Joe Bloggs\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n'
>>> re.sub(r'(\t[^\t]*)\t', r'\1###', text, count=1)
'Joe Bloggs\tNULL###NULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n'

Simply match the first instance of a tab followed by any number of non-tabs followed by a tab, and replace it with everything but the final tab and whatever you want to replace it with.

Solution 2:

>>> re.sub(r'^((?:(?!\t).)*\t(?:(?!\t).)*)\t',r'\1###', booby)
'Joe Bloggs\tNULL###NULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n'

You are almost there, add \1 before ###

I provide another way to solve it because of the comments:

>>> booby.replace("\t", "###",2).replace("###", "\t",1)
'Joe Bloggs\tNULL###NULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n'

Solution 3:

With regex

This is the shortest regex I could find :

import re
booby = 'Joe Bloggs\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n'
print re.sub(r'(\t.*?)\t', r'\1###', booby, 1)

It uses non-greedy . to make sure it doesn't glob too many tabs. It outputs :

Joe Bloggs  NULL###NULL NULL    NULL    NULL    NULL    NULL    NULL

With split and join

The regex might get ugly if you need it for other indices. You could use split and join for the general case :

n = 2
sep = '\t'
cells = booby.split(sep)
print sep.join(cells[:n]) + "###" + sep.join(cells[n:])

It outputs :

Joe Bloggs  NULL###NULL NULL    NULL    NULL    NULL    NULL    NULL

Python Guru