performance - How to speed up Pandas multilevel dataframe shift by group? -


i trying shift pandas dataframe column data group of first index. here demo code:

 in [8]: df = mul_df(5,4,3)  in [9]: df out[9]:                  col000  col001  col002 stk_id rpt_date a0000  b000     -0.5505  0.7445 -0.3645        b001      0.9129 -1.0473 -0.5478        b002      0.8016  0.0292  0.9002        b003      2.0744 -0.2942 -0.7117 a0001  b000      0.7064  0.9636  0.2805        b001      0.4763  0.2741 -1.2437        b002      1.1563  0.0525 -0.7603        b003     -0.4334  0.2510 -0.0105 a0002  b000     -0.6443  0.1723  0.2657        b001      1.0719  0.0538 -0.0641        b002      0.6787 -0.3386  0.6757        b003     -0.3940 -1.2927  0.3892 a0003  b000     -0.5862 -0.6320  0.6196        b001     -0.1129 -0.9774  0.7112        b002      0.6303 -1.2849 -0.4777        b003      0.5046 -0.4717 -0.2133 a0004  b000      1.6420 -0.9441  1.7167        b001      0.1487  0.1239  0.6848        b002      0.6139 -1.9085 -1.9508        b003      0.3408 -1.3891  0.6739  in [10]: grp = df.groupby(level=df.index.names[0])  in [11]: grp.shift(1) out[11]:                  col000  col001  col002 stk_id rpt_date a0000  b000         nan     nan     nan        b001     -0.5505  0.7445 -0.3645        b002      0.9129 -1.0473 -0.5478        b003      0.8016  0.0292  0.9002 a0001  b000         nan     nan     nan        b001      0.7064  0.9636  0.2805        b002      0.4763  0.2741 -1.2437        b003      1.1563  0.0525 -0.7603 a0002  b000         nan     nan     nan        b001     -0.6443  0.1723  0.2657        b002      1.0719  0.0538 -0.0641        b003      0.6787 -0.3386  0.6757 a0003  b000         nan     nan     nan        b001     -0.5862 -0.6320  0.6196        b002     -0.1129 -0.9774  0.7112        b003      0.6303 -1.2849 -0.4777 a0004  b000         nan     nan     nan        b001      1.6420 -0.9441  1.7167        b002      0.1487  0.1239  0.6848        b003      0.6139 -1.9085 -1.9508 

the mul_df() code attached here : how speed pandas multilevel dataframe sum?

now want grp.shift(1) big dataframe.

in [1]: df = mul_df(5000,30,400) in [2]: grp = df.groupby(level=df.index.names[0]) in [3]: timeit grp.shift(1) 1 loops, best of 3: 5.23 s per loop 

5.23s slow. how speed ?

(my computer configuration is: pentium dual-core t4200@2.00ghz, 3.00gb ram, windowxp, python 2.7.4, numpy 1.7.1, pandas 0.11.0, numexpr 2.0.1 , anaconda 1.5.0 (32-bit))

how shift total dataframe object , set first row of every group nan?

dfs = df.shift(1) dfs.iloc[df.groupby(level=0).size().cumsum()[:-1]] = np.nan 

Comments

Popular posts from this blog

java - nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet Hibernate+SpringMVC -

sql - Postgresql tables exists, but getting "relation does not exist" when querying -

asp.net mvc - breakpoint on javascript in CSHTML? -