2017-10-16 2 views
0

Disons que mon dataframe ressemble à ceci:Pandas - convertir valeur cumulée à la valeur réelle

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count 
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0 
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0 
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0 
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0 
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0 
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0 
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,54.0 
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0 
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0 
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,524.0 
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,527.0 
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,529.0 

La colonne count à la fin est un compte cumulatif. Ce que je dois faire est de trouver le nombre réel pour une date particulier + place + pays + tuple type + ID, ce qui entraînerait:

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count 
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0.0 
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0.0 
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0.0 
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,1.0 
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0.0 
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0.0 
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,0.0 
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0.0 
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3.0 
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,0.0 
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,3.0 
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,2.0 

Je sais que cela impliquerait un appel groupby mais je ne idée quoi faire au-delà de cela. Supposons que la toute première instance du tuple ait un nombre de 0. Toute aide serait génial. Merci

Répondre

2

Utilisez groupby + diff, l'inverse de cumsum.

cols = ['site', 'country_code', 'kind', 'ID'] 
df['count'] = df.groupby(cols)['count'].diff().fillna(0) 

print(df['count']) 
0  0.0 
1  0.0 
2  0.0 
3  1.0 
4  0.0 
5  0.0 
6  0.0 
7  0.0 
8  3.0 
9  0.0 
10 3.0 
11 2.0 
Name: count, dtype: float64 

Merci à MaxU pour avoir signalé l'erreur!

+0

Merci, mais cela se traduira par une valeur de '467' pour le tuple' (2017-02-15, website2, AU, 1,91) 'alors qu'il devrait être 0 – Craig

+1

Je pense que OP veut quelque chose comme:' df .groupby ('site') ['count']. diff(). fillna (0) ' – MaxU

+0

@MaxU Merci beaucoup! J'ai mal lu la question. –