2017-09-28 1 views
2

J'ai une question simple que je ne peux pas sembler trouver une réponse de détroit pour. Supposons que j'ai une trame de données avec date, ouvert, haut, bas, fermeture et volume.Pandas Sélectionnez les 20 derniers jours de données.

Ce que je suis en train de faire est de trouver d'abord la date que je peux faire avec:

Mon problème vient dans le choix des 20 derniers jours de données à partir de la date actuelle.

Je dois sélectionner les 20 dernières lignes car j'ai besoin de trouver les valeurs les plus hautes et les plus basses dans la colonne proche de cet ensemble de données.

Les pointeurs aideraient beaucoup. Iv a cherché google pendant un moment et continue à trouver des réponses différentes.

Merci!

Répondre

0

Un exemple est le code ci-dessous.

import pandas as pd 
import numpy as np 

today = pd.datetime.today().date() 

df = pd.DataFrame({"date": pd.date_range('20170901','20170930', freq='D'), 
        "open": np.random.rand(30), 
        "high": np.random.rand(30), 
        "low": np.random.rand(30), 
        "close": np.random.rand(30), 
        "volume": np.random.rand(30)}, 
        columns = ["date", "open", "high", "low", "close", "volume"]) 
selected_date = pd.date_range(today - pd.to_timedelta(20, unit='d'), today, freq='D') 
df_selected = df[df["date"].isin(selected_date)] 
# Out[40]: 
#   date  open  high  low  close volume 
# 7 2017-09-08 0.790424 0.999621 0.139619 0.669588 0.476784 
# 8 2017-09-09 0.190239 0.439975 0.362905 0.018472 0.905773 
# 9 2017-09-10 0.184327 0.686411 0.124636 0.741130 0.132774 
# 10 2017-09-11 0.346019 0.022173 0.422704 0.159098 0.011801 
# 11 2017-09-12 0.549928 0.228514 0.851650 0.824209 0.756816 
# 12 2017-09-13 0.413550 0.994019 0.340958 0.905432 0.289316 
# 13 2017-09-14 0.435034 0.485978 0.768520 0.534148 0.276084 
# 14 2017-09-15 0.839840 0.775490 0.481123 0.911378 0.928908 
# 15 2017-09-16 0.442393 0.512893 0.519516 0.844619 0.813230 
# 16 2017-09-17 0.723789 0.646345 0.081776 0.388496 0.391421 
# 17 2017-09-18 0.964289 0.849776 0.156879 0.663885 0.062165 
# 18 2017-09-19 0.001000 0.174666 0.694151 0.777330 0.739554 
# 19 2017-09-20 0.426997 0.541273 0.789910 0.218263 0.748694 
# 20 2017-09-21 0.217904 0.295377 0.087909 0.765242 0.555663 
# 21 2017-09-22 0.910734 0.848182 0.476946 0.374580 0.079900 
# 22 2017-09-23 0.160963 0.795219 0.956262 0.744048 0.645552 
# 23 2017-09-24 0.412634 0.722252 0.226693 0.524794 0.910259 
# 24 2017-09-25 0.535072 0.131761 0.931164 0.618055 0.542512 
# 25 2017-09-26 0.697222 0.552784 0.537899 0.773403 0.916538 
# 26 2017-09-27 0.257628 0.479550 0.539444 0.540076 0.344933 
# 27 2017-09-28 0.270114 0.914036 0.137004 0.939907 0.736016 

De plus, le maximum et le minimum de l'objet fermé sont obtenus comme suit.

df_max = df_selected[df_selected['close'] == df_selected['close'].max()] 
# Out[48]: 
#   date  open  high  low  close volume 
# 27 2017-09-28 0.270114 0.914036 0.137004 0.939907 0.736016 

df_min = df_selected[df_selected['close'] == df_selected['close'].min()] 
# Out[49]: 
#   date  open  high  low  close volume 
# 8 2017-09-09 0.190239 0.439975 0.362905 0.018472 0.905773 
1

Si vous voulez simplement les 20 dernières lignes d'un DataFrame vous pouvez simplement utiliser df[-20:]. Au lieu de cela, si vous voulez avoir la date il y a 20 jours, vous devez faire pd.Timedelta(-19, unit='d') + pd.datetime.today().date().

In [1]: index = pd.date_range(start=(pd.Timedelta(-30, unit='d')+pd.datetime.today().date()), periods=31) 

In [2]: df = pd.DataFrame(np.random.rand(31, 4), index=index, columns=['O', 'H', 'L', 'C']) 

In [3]: df = df.reset_index().rename(columns={'index': 'Date'}) 

In [4]: df 
Out[4]: 
     Date   O   H   L   C 
0 2017-08-28 0.616856 0.518961 0.378005 0.716371 
1 2017-08-29 0.300977 0.652217 0.713013 0.842369 
2 2017-08-30 0.875668 0.232998 0.566047 0.969647 
3 2017-08-31 0.273934 0.086575 0.386617 0.390749 
4 2017-09-01 0.667561 0.336419 0.648809 0.619215 
5 2017-09-02 0.988234 0.563675 0.402908 0.671333 
6 2017-09-03 0.111710 0.549302 0.321546 0.201828 
7 2017-09-04 0.469041 0.736152 0.345069 0.336593 
8 2017-09-05 0.674844 0.276839 0.350289 0.862777 
9 2017-09-06 0.128124 0.968918 0.713846 0.415061 
10 2017-09-07 0.920488 0.252980 0.573531 0.270999 
11 2017-09-08 0.113368 0.781649 0.190273 0.758834 
12 2017-09-09 0.414453 0.545572 0.761805 0.586717 
13 2017-09-10 0.348459 0.830177 0.779591 0.783887 
14 2017-09-11 0.571877 0.230465 0.262744 0.360188 
15 2017-09-12 0.844286 0.821388 0.312319 0.473672 
16 2017-09-13 0.605548 0.570590 0.457141 0.882498 
17 2017-09-14 0.242154 0.066617 0.028913 0.969698 
18 2017-09-15 0.725521 0.742362 0.904866 0.890942 
19 2017-09-16 0.460858 0.749581 0.429131 0.723394 
20 2017-09-17 0.767445 0.452113 0.906294 0.978368 
21 2017-09-18 0.342970 0.702579 0.029031 0.743489 
22 2017-09-19 0.221478 0.339948 0.403478 0.349097 
23 2017-09-20 0.147785 0.633542 0.692545 0.194496 
24 2017-09-21 0.656189 0.419257 0.099094 0.708530 
25 2017-09-22 0.329901 0.087101 0.683207 0.558431 
26 2017-09-23 0.902550 0.155262 0.304506 0.756210 
27 2017-09-24 0.072132 0.045242 0.058175 0.755649 
28 2017-09-25 0.149873 0.340870 0.198454 0.725051 
29 2017-09-26 0.972721 0.505842 0.886602 0.231916 
30 2017-09-27 0.511109 0.990975 0.330336 0.898291 

In [5]: df[-20:] 
Out[5]: 
     Date   O   H   L   C 
11 2017-09-08 0.113368 0.781649 0.190273 0.758834 
12 2017-09-09 0.414453 0.545572 0.761805 0.586717 
13 2017-09-10 0.348459 0.830177 0.779591 0.783887 
14 2017-09-11 0.571877 0.230465 0.262744 0.360188 
15 2017-09-12 0.844286 0.821388 0.312319 0.473672 
16 2017-09-13 0.605548 0.570590 0.457141 0.882498 
17 2017-09-14 0.242154 0.066617 0.028913 0.969698 
18 2017-09-15 0.725521 0.742362 0.904866 0.890942 
19 2017-09-16 0.460858 0.749581 0.429131 0.723394 
20 2017-09-17 0.767445 0.452113 0.906294 0.978368 
21 2017-09-18 0.342970 0.702579 0.029031 0.743489 
22 2017-09-19 0.221478 0.339948 0.403478 0.349097 
23 2017-09-20 0.147785 0.633542 0.692545 0.194496 
24 2017-09-21 0.656189 0.419257 0.099094 0.708530 
25 2017-09-22 0.329901 0.087101 0.683207 0.558431 
26 2017-09-23 0.902550 0.155262 0.304506 0.756210 
27 2017-09-24 0.072132 0.045242 0.058175 0.755649 
28 2017-09-25 0.149873 0.340870 0.198454 0.725051 
29 2017-09-26 0.972721 0.505842 0.886602 0.231916 
30 2017-09-27 0.511109 0.990975 0.330336 0.898291 

In [6]: df[df.Date.isin(pd.date_range(pd.Timedelta(-19, unit='d')+pd.datetime.today().date(), periods=20))] 
Out[6]: 
     Date   O   H   L   C 
11 2017-09-08 0.113368 0.781649 0.190273 0.758834 
12 2017-09-09 0.414453 0.545572 0.761805 0.586717 
13 2017-09-10 0.348459 0.830177 0.779591 0.783887 
14 2017-09-11 0.571877 0.230465 0.262744 0.360188 
15 2017-09-12 0.844286 0.821388 0.312319 0.473672 
16 2017-09-13 0.605548 0.570590 0.457141 0.882498 
17 2017-09-14 0.242154 0.066617 0.028913 0.969698 
18 2017-09-15 0.725521 0.742362 0.904866 0.890942 
19 2017-09-16 0.460858 0.749581 0.429131 0.723394 
20 2017-09-17 0.767445 0.452113 0.906294 0.978368 
21 2017-09-18 0.342970 0.702579 0.029031 0.743489 
22 2017-09-19 0.221478 0.339948 0.403478 0.349097 
23 2017-09-20 0.147785 0.633542 0.692545 0.194496 
24 2017-09-21 0.656189 0.419257 0.099094 0.708530 
25 2017-09-22 0.329901 0.087101 0.683207 0.558431 
26 2017-09-23 0.902550 0.155262 0.304506 0.756210 
27 2017-09-24 0.072132 0.045242 0.058175 0.755649 
28 2017-09-25 0.149873 0.340870 0.198454 0.725051 
29 2017-09-26 0.972721 0.505842 0.886602 0.231916 
30 2017-09-27 0.511109 0.990975 0.330336 0.898291 
0

Configuration

today = pd.datetime.today().date() 

df = pd.DataFrame(
    np.random.rand(20, 4), 
    pd.date_range(end=today, periods=20, freq='3D'), 
    columns=['O', 'H', 'L', 'C']) 

df 

        O   H   L   C 
2017-08-01 0.821996 0.894122 0.829814 0.429701 
2017-08-04 0.883512 0.668642 0.524440 0.914845 
2017-08-07 0.035753 0.231787 0.421547 0.163865 
2017-08-10 0.742781 0.293591 0.874033 0.054421 
2017-08-13 0.252422 0.632991 0.547044 0.650622 
2017-08-16 0.316752 0.190016 0.504701 0.827450 
2017-08-19 0.777069 0.533121 0.329742 0.603473 
2017-08-22 0.843260 0.546845 0.600270 0.060620 
2017-08-25 0.834180 0.395653 0.189499 0.820043 
2017-08-28 0.806369 0.850968 0.753335 0.902687 
2017-08-31 0.336096 0.145325 0.876519 0.114923 
2017-09-03 0.590195 0.946520 0.009151 0.832992 
2017-09-06 0.901101 0.616852 0.375829 0.332625 
2017-09-09 0.537892 0.852527 0.082807 0.966297 
2017-09-12 0.104929 0.803415 0.345942 0.245934 
2017-09-15 0.085703 0.743497 0.256762 0.530267 
2017-09-18 0.823960 0.397983 0.173706 0.091678 
2017-09-21 0.211412 0.980942 0.833802 0.763510 
2017-09-24 0.312950 0.850760 0.913519 0.846466 
2017-09-27 0.921168 0.568595 0.460656 0.016313 

Solution
Utilisez pandas géants de découpage en tranches d'index datetime. Très simple et direct et ce que les pandas devs destiné à ce problème. Note: Cela ne fait pas combien de lignes sont dans les 20 derniers jours, il les attrape tous. C'est ce que je pense que l'OP voulait.

df[today - pd.offsets.Day(20):] 

        O   H   L   C 
2017-09-09 0.537892 0.852527 0.082807 0.966297 
2017-09-12 0.104929 0.803415 0.345942 0.245934 
2017-09-15 0.085703 0.743497 0.256762 0.530267 
2017-09-18 0.823960 0.397983 0.173706 0.091678 
2017-09-21 0.211412 0.980942 0.833802 0.763510 
2017-09-24 0.312950 0.850760 0.913519 0.846466 
2017-09-27 0.921168 0.568595 0.460656 0.016313 
+1

Monsieur, vous aurez un upvote par 8 heures de moi. Tellement à apprendre. – Dark