input - Why Python does not see all the rows in a file? -


i count number of rows (lines) in file using python in following method:

n = 0 line in file('input.txt'):    n += 1 print n 

i run script under windows.

then count number of rows in same file using unix command:

wc -l input.txt 

counting unix command gives larger number of rows.

so, question is: why python not see rows in file? or question of definition?

you have file 1 or more dos eof (ctrl-z) characters in it, ascii codepoint 0x1a. when windows opens file in text mode, it'll still honour old dos semantics , end file whenever reads character. see line reading chokes on 0x1a.

only opening file in binary mode can bypass behaviour. , still count lines, have 2 options:

  • read in chunks, count number of line separators in each chunk:

    def bufcount(filename, linesep=os.linesep, buf_size=2 ** 15):     lines = 0     open(filename, 'rb') f:         last = ''         buf in iter(f.read, ''):             lines += buf.count(linesep)             if last , last + buf[0] == linesep:                 # count line separators straddling boundary                 lines += 1             if len(linesep) > 1:                 last = buf[-1]     return lines 

    take account on windows os.linesep set \r\n, adjust needed file; in binary mode line separators not translated \n.

  • use io.open(); io set of file objects open file in binary mode always, translations themselves:

    import io  io.open(filename) f:     lines = sum(1 line in f) 

Comments

Popular posts from this blog

java - nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet Hibernate+SpringMVC -

sql - Postgresql tables exists, but getting "relation does not exist" when querying -

asp.net mvc - breakpoint on javascript in CSHTML? -