performance - How to speed up Pandas multilevel dataframe shift by group? -
i trying shift pandas dataframe column data group of first index. here demo code:
in [8]: df = mul_df(5,4,3) in [9]: df out[9]: col000 col001 col002 stk_id rpt_date a0000 b000 -0.5505 0.7445 -0.3645 b001 0.9129 -1.0473 -0.5478 b002 0.8016 0.0292 0.9002 b003 2.0744 -0.2942 -0.7117 a0001 b000 0.7064 0.9636 0.2805 b001 0.4763 0.2741 -1.2437 b002 1.1563 0.0525 -0.7603 b003 -0.4334 0.2510 -0.0105 a0002 b000 -0.6443 0.1723 0.2657 b001 1.0719 0.0538 -0.0641 b002 0.6787 -0.3386 0.6757 b003 -0.3940 -1.2927 0.3892 a0003 b000 -0.5862 -0.6320 0.6196 b001 -0.1129 -0.9774 0.7112 b002 0.6303 -1.2849 -0.4777 b003 0.5046 -0.4717 -0.2133 a0004 b000 1.6420 -0.9441 1.7167 b001 0.1487 0.1239 0.6848 b002 0.6139 -1.9085 -1.9508 b003 0.3408 -1.3891 0.6739 in [10]: grp = df.groupby(level=df.index.names[0]) in [11]: grp.shift(1) out[11]: col000 col001 col002 stk_id rpt_date a0000 b000 nan nan nan b001 -0.5505 0.7445 -0.3645 b002 0.9129 -1.0473 -0.5478 b003 0.8016 0.0292 0.9002 a0001 b000 nan nan nan b001 0.7064 0.9636 0.2805 b002 0.4763 0.2741 -1.2437 b003 1.1563 0.0525 -0.7603 a0002 b000 nan nan nan b001 -0.6443 0.1723 0.2657 b002 1.0719 0.0538 -0.0641 b003 0.6787 -0.3386 0.6757 a0003 b000 nan nan nan b001 -0.5862 -0.6320 0.6196 b002 -0.1129 -0.9774 0.7112 b003 0.6303 -1.2849 -0.4777 a0004 b000 nan nan nan b001 1.6420 -0.9441 1.7167 b002 0.1487 0.1239 0.6848 b003 0.6139 -1.9085 -1.9508
the mul_df()
code attached here : how speed pandas multilevel dataframe sum?
now want grp.shift(1)
big dataframe.
in [1]: df = mul_df(5000,30,400) in [2]: grp = df.groupby(level=df.index.names[0]) in [3]: timeit grp.shift(1) 1 loops, best of 3: 5.23 s per loop
5.23s slow. how speed ?
(my computer configuration is: pentium dual-core t4200@2.00ghz, 3.00gb ram, windowxp, python 2.7.4, numpy 1.7.1, pandas 0.11.0, numexpr 2.0.1 , anaconda 1.5.0 (32-bit))
how shift total dataframe object , set first row of every group nan?
dfs = df.shift(1) dfs.iloc[df.groupby(level=0).size().cumsum()[:-1]] = np.nan
Comments
Post a Comment