python - Pandas read_fwf: specify dtype -

- February 15, 2011

i reading in huge fixed width text file in chunks , export data csv. because pandas.read_fwf not allow specify dtypes, wondering other way there exists force columns strings. reason pandas infers columns float though not , not want .0 within column.

using data[column] = data[column].astype(str) not not rid of decimals. converting columns of float64 dtype int doesn't work either since nas cannot converted. ideas?

here's snippet of code:

dat = pd.read_fwf(file_to_read,colspecs=cols,header=none,chunksize=100000,names=header) #first chunk data.info() int64index: 100000 entries, 0 99999 columns: 562 entries, dtypes: float64(405), int64(4), object(153) memory usage: 429.5+ mb  column in data.columns:     if data[column].dtype == 'float64':         data[column] = data[column].astype(int)     else:         pass

i str().replace('.0',''), want find easier way iterating through column takes lot of time.

the converter parameter can used preserve data strings since pd.read_fwf not try guess dtype if converter specified:

import pandas pd try:     # python2     cstringio import stringio  except importerror:     # python3     io import stringio  content = '''\ 1.0    2    3.0    4    b 5      x    c m      y    d ''' header = ['foo', 'bar', 'baz']  df in pd.read_fwf(stringio(content), header=none, chunksize=2, names=header,                       converters={h:str h in header}):     print(df) df.info()

yields

   foo bar baz 0  1.0   2   1  3.0   4   b    foo bar baz 0   5   x   c 1   m   y   d  <class 'pandas.core.frame.dataframe'> rangeindex: 2 entries, 0 1 data columns (total 3 columns): foo    2 non-null object bar    2 non-null object baz    2 non-null object dtypes: object(3) memory usage: 120.0+ bytes

Search This Blog

First Image

python - Pandas read_fwf: specify dtype -

Comments

Post a Comment

Popular posts from this blog

php - Passing multiple values in a url using checkbox -

compilation - PHP install fails on Ubuntu 14 (make: *** [sapi/cli/php] Error 1) PHP 5.6.20 -

sql - Postgresql tables exists, but getting "relation does not exist" when querying -