input - Why Python does not see all the rows in a file? -
i count number of rows (lines) in file using python in following method:
n = 0 line in file('input.txt'): n += 1 print n
i run script under windows.
then count number of rows in same file using unix command:
wc -l input.txt
counting unix command gives larger number of rows.
so, question is: why python not see rows in file? or question of definition?
you have file 1 or more dos eof (ctrl-z) characters in it, ascii codepoint 0x1a. when windows opens file in text mode, it'll still honour old dos semantics , end file whenever reads character. see line reading chokes on 0x1a.
only opening file in binary mode can bypass behaviour. , still count lines, have 2 options:
read in chunks, count number of line separators in each chunk:
def bufcount(filename, linesep=os.linesep, buf_size=2 ** 15): lines = 0 open(filename, 'rb') f: last = '' buf in iter(f.read, ''): lines += buf.count(linesep) if last , last + buf[0] == linesep: # count line separators straddling boundary lines += 1 if len(linesep) > 1: last = buf[-1] return lines
take account on windows
os.linesep
set\r\n
, adjust needed file; in binary mode line separators not translated\n
.use
io.open()
;io
set of file objects open file in binary mode always, translations themselves:import io io.open(filename) f: lines = sum(1 line in f)
Comments
Post a Comment