How To Extract Text Between The Matching Pattern In Python
I am new to python and wanted to try it to extract text between the matching pattern in each line of my tab delimited text file (mydata) mydata.txt: Sequence
Solution 1:
If I'm understanding correctly, this should work for a given line of your data:
data = line.split("locus_tag=")[1].split("][db_xref")[0]
The idea is to split the string on locus_tag=
, take the 2nd element, then split that string on ][db_xref
and take the first element.
If you want help with the outer loop it could look like:
for line in open(file_path, 'r'):
if"locus_tag" in line:
data = line.split("locus_tag=")[1].split("][db_xref")[0]
print(data)
Solution 2:
You can use re.search
with positive lookbehind and positive lookahead patterns:
import re
...
for line in input_data:
match = re.search(r'(?<=\[locus_tag=).*(?=\]\[db_xre)', line)
if match:
print(match.group())
Post a Comment for "How To Extract Text Between The Matching Pattern In Python"