Skip to content Skip to sidebar Skip to footer

Filteration In Txt File In Python

I have too many lines like this: >ENSG00000100206|ENST00000216024|DMC1|2371|38568257;38570043|38568289;38570286 CTCAGACGTCGGGCCGACGCAAGGCCACGCGCGCGAACACACAGGTGCGGCCCCGGGCCA CACG

Solution 1:

You can start by using Biopython to get a proper FASTA format parser: http://biopython.org/wiki/SeqIO

Then iterate over the records, and do what you want with them. This will save you not only the time to write a parser, but also will prevent you from doing it completely wrong.

Example from that very page:

from Bio import SeqIO
for record in SeqIO.parse("example.fasta", "fasta"):
    print(record.id)

Instead of a print, create a dict {record.id: record.length} that you update only if the length is longer.


Post a Comment for "Filteration In Txt File In Python"