Skip to content Skip to sidebar Skip to footer

How To Extract Text Between The Matching Pattern In Python

I am new to python and wanted to try it to extract text between the matching pattern in each line of my tab delimited text file (mydata) mydata.txt: Sequence

Solution 1:

If I'm understanding correctly, this should work for a given line of your data:

data = line.split("locus_tag=")[1].split("][db_xref")[0]

The idea is to split the string on locus_tag=, take the 2nd element, then split that string on ][db_xref and take the first element.

If you want help with the outer loop it could look like:

for line in open(file_path, 'r'):
    if"locus_tag" in line:
        data = line.split("locus_tag=")[1].split("][db_xref")[0]
        print(data)

Solution 2:

You can use re.search with positive lookbehind and positive lookahead patterns:

import re
...
for line in input_data:
    match = re.search(r'(?<=\[locus_tag=).*(?=\]\[db_xre)', line)
    if match:
        print(match.group())

Post a Comment for "How To Extract Text Between The Matching Pattern In Python"