Understanding File Iteration In Python
Solution 1:
line
is a line of text, represented as a string. Strings are immutable, but that's not an issue for manipulating them; all variables in Python are references, and assigning to a variable points the reference to a new object. (In C++, you can't change where a reference points.) Iterating over a file iterates over the lines, so on each iteration, line
refers to a new string representing the next line of the input file.
If you're familiar with range-based for loops or other language's for-each constructs, that's how Python's for
works. The loop variable is not a counter; you can't do
ifline== 2:
because line
isn't the index of the line; it's the line itself. You could do
for i, line inenumerate(f):
if i == 2:
do_stuff_with(line)
break# No need to load the rest of the file
Note that file
is the name of a builtin, so it's a bad idea to use that name for your own variables.
Solution 2:
Suppose you have your same file:
3 10 7 8\n
2 9 8 3\n
4 1 4 2\n
There are many file methods that operate on a file object
In Python, you can read a file character by character, C style:
withopen('/tmp/test.txt', 'r') as fin: # fin is a 'file object' whileTrue:
ch=fin.read(1)
ifnot ch:
breakprint ch, # comma suppresses the CR
You can read the whole file as a single string:
with open('/tmp/test.txt', 'r') as fin:
data=fin.read()
print data
As enumerated lines:
withopen('/tmp/test.txt', 'r') as fin:
for i, line inenumerate(fin):
print i, line
As a list of strings:
withopen('/tmp/test.txt', 'r') as fin:
data=fin.readlines()
The idiom of looping over a file object:
for line in fin: # 'fin'is a file object result of open
print line
is synonymous with:
for line in fin.readline():
print line
and similar to:
for line in'line 1\nline 2\nline 3'.splitlines():
print line
Once you get used to the Python style loops (or Perl, or Obj C, or Java range style loops) that loop over the elements of something -- you use them without thinking about it much.
If you want the index of each item -- use enumerate
Solution 3:
In each iteration the line
variable is filled with contents of subsequent lines read from the file. So, you'll have:
"3 10 7 8" in first iteration "2 9 8 3" in second iteration etc.
To get the numbers separately, use the split method: link.
So comparing line
with 2
doesn't make sens. If you want to identify line numbers, you can try:
lineNumber = 0
for line in file:
print line
if lineNumber == 2:
print"that was the second line!"
lineNumber += 1
As suggested in the comment, you can simplify this by using enumerate:
for lineNumber, line inenumerate(file):
print line
if lineNumber == 2:
print"that was the second line!"
Solution 4:
In Python, you can iterate straight over a file. The best way of doing this is with a with
statement, as in:
withopen("myfile.txt") as f:
for i in f:
# do stuff to each line in the file
The lines are strings representing each line (seperated by newlines) in the file. If you only want to operate on the second line, you could do something like this:
withopen("myfile.txt") as f:
list_of_file = list(f)
second_line = list_of_file[2]
If you then want to access part of the second line you can split it by spaces into another list as so:
second_number_in_second_line = second_line.split()[1]
With regards to memory, iterating through the file directly does not read it all into memory, however, turning it into a list
does. If you want to access individual lines without doing so, use itertools.islice
.
Solution 5:
You can iterate over a file of any size, with the code you have shown, and it should not consume any significant amount of memory beyond the size of the longest single line.
As for how it works, under the hood, you could dive into the source code for Python itself to learn the gory details. At a higher level just consider that the implementor of file objects, in Python, chose to implement line-by-line iteration as a feature of their class.
Many of the collection data types and I/O interfaces in Python implement some form of iteration. Thus the for
construct is the most common type of looping in Python. You can iterate over lists, tuples, and sets (by item), strings (by character), dictionaries (by key), and many classes (including those in the standard libraries as well as those from third parties) implement the
"iterator (coding) protocol" to facilitate such usage.
Post a Comment for "Understanding File Iteration In Python"