Filteration In Txt File In Python
I have too many lines like this: >ENSG00000100206|ENST00000216024|DMC1|2371|38568257;38570043|38568289;38570286 CTCAGACGTCGGGCCGACGCAAGGCCACGCGCGCGAACACACAGGTGCGGCCCCGGGCCA CACG
Solution 1:
You can start by using Biopython to get a proper FASTA format parser: http://biopython.org/wiki/SeqIO
Then iterate over the records, and do what you want with them. This will save you not only the time to write a parser, but also will prevent you from doing it completely wrong.
Example from that very page:
from Bio import SeqIO
for record in SeqIO.parse("example.fasta", "fasta"):
print(record.id)
Instead of a print, create a dict {record.id: record.length}
that you update only if the length is longer.
Post a Comment for "Filteration In Txt File In Python"