pandas - python: recursively find the distance between points in a group -

- September 15, 2013

i can apply vincenty in geopy dataframe in pandas , determine distance between 2 consecutive machines. however, want find distance between machines in group without repeating.

for example, if group company name , there 3 machines associated company, want find distance between machine 1 , 2, 1 , 3, , (2 , 3) not calculate distance between (2 , 1) , (3 , 1) since symmetric (identical results).

import pandas pd geopy.distance import vincenty  df = pd.dataframe({'ser_no': [1, 2, 3, 4, 5, 6, 7, 8, 9, 0],                 'co_nm': ['aa', 'aa', 'aa', 'bb', 'bb', 'bb', 'bb', 'cc', 'cc', 'cc'],                 'lat': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],                 'lon': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]})  coord_col = ['lat', 'lon'] matching_cust = df['co_nm'] == df['co_nm'].shift(1) shift_coords = df.shift(1).loc[matching_cust, coord_col] # join in shifted coords , compute distance df_shift = df.join(shift_coords, how = 'inner', rsuffix = '_2') # return distance in miles df['dist'] = df_shift.apply(lambda x: vincenty((x[1], x[2]),      (x[4], x[5])).mi, axis = 1)

this finds distance of consecutive machines in group how can expand on find distance of machines in group?

this code returns:

  co_nm  lat  lon  ser_no      dist 0    aa    1   21       1       nan 1    aa    2   22       2  97.47832 2    aa    3   23       3  97.44923 3    bb    4   24       4       nan 4    bb    5   25       5  97.34752 5    bb    6   26       6  97.27497 6    bb    7   27       7  97.18804 7    cc    8   28       8       nan 8    cc    9   29       9  96.97129 9    cc   10   30       0  96.84163

edit:

the desired output find unique distance combinations machines related company; is, co_nm aa have distance between ser_no (1,2), (1,3), (2,3), (1,3) , distance machines in co_nm bb , cc well, wouldn't determine distance of machines in different co_nm groups.

does make sense?

update2: using function:

def calc_dist(df):     return pd.dataframe(                [ [grp,                   df.loc[c[0]].ser_no,                   df.loc[c[1]].ser_no,                   vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])                  ]                  grp,lst in df.groupby('co_nm').groups.items()                  c in combinations(lst, 2)                ],                columns=['co_nm','machinea','machineb','distance'])  in [27]: calc_dist(df) out[27]:    co_nm  machinea  machineb               distance 0     aa         1         2  156.87614939082016 km 1     aa         1         3   313.7054454472326 km 2     aa         2         3    156.829329105069 km 3     cc         8         9  156.06016539095216 km 4     cc         8         0   311.9109981692541 km 5     cc         9         0  155.85149813446617 km 6     bb         4         5  156.66564183673603 km 7     bb         4         6   313.2143330250297 km 8     bb         4         7   469.6225353388079 km 9     bb         5         6  156.54889741438788 km 10    bb         5         7  312.95759746593706 km 11    bb         6         7   156.4089967703544 km

update:

in [9]: dist = pd.dataframe(    ...:   [ [grp,    ...:      df.loc[c[0]].ser_no,    ...:      df.loc[c[1]].ser_no,    ...:      vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])    ...:     ]    ...:     grp,lst in df.groupby('co_nm').groups.items()    ...:     c in combinations(lst, 2)    ...:   ],    ...:   columns=['co_nm','machinea','machineb','distance'])  in [10]: dist out[10]:    co_nm  machinea  machineb               distance 0     aa         1         2  156.87614939082016 km 1     aa         1         3   313.7054454472326 km 2     aa         2         3    156.829329105069 km 3     cc         8         9  156.06016539095216 km 4     cc         8         0   311.9109981692541 km 5     cc         9         0  155.85149813446617 km 6     bb         4         5  156.66564183673603 km 7     bb         4         6   313.2143330250297 km 8     bb         4         7   469.6225353388079 km 9     bb         5         6  156.54889741438788 km 10    bb         5         7  312.95759746593706 km 11    bb         6         7   156.4089967703544 km

explanation: combination part

in [11]: [c    ....:  grp,lst in df.groupby('co_nm').groups.items()    ....:  c in combinations(lst, 2)] out[11]: [(0, 1),  (0, 2),  (1, 2),  (7, 8),  (7, 9),  (8, 9),  (3, 4),  (3, 5),  (3, 6),  (4, 5),  (4, 6),  (5, 6)]

old answer:

in [3]: itertools import combinations  in [4]: import pandas pd  in [5]: geopy.distance import vincenty  in [6]: df = pd.dataframe({'machine': [1,2,3], 'lat': [11, 12, 13], 'lon': [21,22,23]})  in [7]: df out[7]:    lat  lon  machine 0   11   21        1 1   12   22        2 2   13   23        3  in [8]: dist = pd.dataframe(    ...:   [ [df.loc[c[0]].machine,    ...:      df.loc[c[1]].machine,    ...:      vincenty(df.loc[c[0], ['lat','lon']], df.loc[c[1], ['lat','lon']])    ...:     ]    ...:     c in combinations(df.index, 2)    ...:   ],    ...:   columns=['machinea','machineb','distance'])  in [9]: dist out[9]:    machinea  machineb               distance 0         1         2   155.3664523771998 km 1         1         3   310.4557192973811 km 2         2         3  155.09044419651156 km

Search This Blog

First Image

pandas - python: recursively find the distance between points in a group -

Comments

Post a Comment

Popular posts from this blog

php - Passing multiple values in a url using checkbox -

java - nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet Hibernate+SpringMVC -

sql - Postgresql tables exists, but getting "relation does not exist" when querying -